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Usk: rmpro^mrnft to or frehltog to Starch Content of ftonfr 

Field of the Invention 

This invention relates to novel nucleic acid sequences, vectors and host cells comprising 
the nucleic acid scquence(s), to polypeptides encoded thereby, and to a method of altering 
a host cull- by introducing the nucleic acid sequences) of the invention. 

Background to the Invention 

Starch consists of two main polysaccharides, amytose and amylopectin. Amyluse is a 
linear polymer containing a- 1 ,4 linked glucose units, while amylopectin is a highly 
branched polymer consisting of a a- 1,4 linked glucan backbone with a- 1,6 linked glucan 
branches. In most plant storage reserves amylopectin consitutes about 75% of the starch 
content. Amylopectin is. synthesized by the concerted action of soluble starch synthase and 
starch branching enzyme [a-1 ,4 glucan: cM,4 glucan rVglycosy {transferase, EC 2.4. 1. 18]. 
Starch branching enzyme (SBE) hydrolyses a- 1.4 linkages and rejoins the cleaved glucan. 
via an 1.6 linkage, to an acceptor chain to produce a branched stricture. The physical 
properties of starch are strongly arfected by the relative abundance of amylosc and 
nmylopectin, and SBE is therefore a crucial enzyme in determining both the quantity and 
quality of starches produced in plant systems. 

Starches are commercially available from several plant soarccslncluding maize, potato and 
cassava. Each of these starches has unique physical characteristics and properties and a 
variety of possible industrial uses. In maize there arc a number of naturally occurring 
mutants which have altered starch composition such as high amylopectin types ("waxy* 
starches) or high amy lose starches but in potato and cassava no such mutants exist on a 
commercial basis as yet. 
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Genetic modification offers the possibility of obtaining new starches which may have novel 
and potentially useful characteristics. Most of the work to dote has involved potato plants 
because they are amenable to genetic manipulation i.e. they can be transformed using 
Agrobactcrium and regenerated easily from tissue culture. In addition many of the genes 
involved in starch biosynthesis have been cloned from potato and thus are available as 
targets for genetic manipulation, for example, by antisense inhibition of expression or 
sense suppression. 

Cassava {Manikot esculcnta L. Crantz) is an important crop in the tropics, where its 
starch-filled roots are used both as a food source and increasingly as a source of starch. 
Cassava is a high yielding perennial crop that can grow on poor soils and is also tolerant 
of drought. Cassava starch being a root-derived starch has properties similar but not 
identical to potato starch and is composed of 20-25% amylose and 75-80% amylopectin 
(Rickard et al. % 1991. Trop. Sci. 31, 189-207). Some of the genes involved in starch 
biosynthesis have been cloned from cassava, including starch branching enzyme 1 (SBE 
I) (Salehuzzaman era/., 1994 Plant Science 98, 53-62), and granule bound starch synthase 
1 (OBSS I) (Salehuzzaman et ul„ 1993 Plant Molecular Biology 23, 947*962) and some 
work has been done on their expression patterns although only in in vitro grown plants 
(Salehuzzaman et al. f 1994 Plant Science 98. 53-62). 

In most plants studied to date e.g. maize (Boyer & Pre las, 1978 Biochcm. Biophys. Res. 
Comm. HU f 169-175). rice (Smyth. 1988 Plant Sci. 57, 1-8) and pea (Smith, Plants 775, 
270-279), two forms of SBE have been identified, each encoded by a separate gene. A 
recent review by Burton tt ul. . ( 1995 The Plant Journal 7, 3- 15) has demonstrated that the 
two forms of SBE constitute distinct classes of the enzyme such that, in general, enzymes 
of the same class from different plants may exhibit greater similarity than enzymes of 
different classes from the same plant. In their review. Burton ef at. termed the two 
respective enzyme families class "A" and class "B\ and the reader is referred thereto (and 
to the references cited therein) for a detailed discussion of the distinctions between the two 
classes. One general distinction of note would appear to be the presence, in class A SBE 
molecules, of a flexible N-termtnal domain, which Is not found in class B molecules. The 
distinctions noted by Burton ct ul. are relied on herein to define class ATutd class B SBE 
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Many organisations have interests in obtaining modified Cassava starches by means of 
genetic modification. This is impossible to achieve however, unless the plant is amenable 
to transformation and regeneration, and the starch biosynthesis genes which are to be 
targeted for modification must be cloned. The production of transgenic cassava plants has 
only recently been demonstrated (Taylor et a/., 1996 Nature Biotechnology 14. 726-730; 
Schdpke et a/., 1996 Nature Biotechnology 14, 731-735; and Li et <*/., 1996 Nature 
Biotechnology 14, 736-740). The present invention concerns the identification, cloning 
and sequencing of a starch biosymhetic gene from Cassava, suitable as a target for genetic 
manipulation. 

Summary of the Invention 

In a first aspect the invention provides a nucleic add sequence encoding a polypeptide 
having starch branching enzyme (SBE) activity, the polypeptide comprising an effective 
portion of the amino acid sequences shown in Figure 4 or Figure 13. The nucleic acid 
is conveniently in substantial isolation, especially in isolation from other naturally 
associated nucleic acid sequences. 

An "effective portion" of the amino acid sequences may be defined as a portion which 
retains sufficient SBE activity when expressed in £ coli KV832 to complement the 
branching enzyme mutation therein. The amino acid sequences shown in Figures 4 and 
13 include the N terminal transit peptide, which comprises about the first 50 amino acid 
residues. As those skilled in the an will be well aware, such a transit peptide is not 
essential for SBE activity. Thus the mature polypeptide, lacking a transit peptide, may 
be considered as one example of an effective portion of the amino acid sequence shown 
in Figure 4 or Figure 13. 

Other effective portions may be obtained by effecting minor deletions in the amino acid 
sequence, whilst substantially preserving SBE activity. Comparison with known class A 
SBE sequences, with the benefit of thu disclosure hereinTwill enable those skilled in the 
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an co identify regions of the polypeptide which are less well conserved and so amenable 
to minor deletion, or amino acid substitution (particularly, conservative amino acid 
substitution) whilst substantially preserving SBE activity. Such less well-conserved 
regions are generally found.io the N terminal amino acid residues (up to the triple proline 
"elbow" at residues 138-140 in Figure 4 and up to the proline elbow at residues 143-145 
in Figure 13) and in the last 50 residues or so of the C terminal, and in particular in the 
acidic tail of the C terminal. 

Conveniently the nucleic acid sequence is obtainable from cassava, preferably obtained 
therefrom, and typically encodes a polypeptide obtainable from cassava. In a particular 
embodiment, the encoded polypeptide may have the amino acid sequence NSKH at about 
position 697 (in relation to Figure 4), which sequence appears peculiar to an isoform of 
the SBE class A enzyme of cassava, other class A SBE enzymes having the conserved 
sequence DA D/E Y (Burton tt a/., 1995 cited above). 

In a particular aspect of the invention there is provided a nucleic acid comprising a portion 
of nucleotides 21 to 2531 of the nucleic acid sequence shown in Figure 4, or a functionally 
equivalent nucleic acid sequence. Such functionally equivalent nucleic acid sequences 
include, but are not limited to, those sequences which encode substantially the same amino 
acid sequence but which differ in nucleotide sequence from that shown in Figure 4 by 
virtue of the degeneracy of the genetic code. For example, a nucleic acid sequence may 
be altered (e.g. "codon optimised') for expression in a host other than cassava, such that 
the nucleotide sequence differs substantially whilst the amino acid $fi"tnrt of the mrflrtrrt 
polypeptide is unchanged. Other furictionaliy equivalent nucleic acid sequences are those 
.which will hybridise under stringent hybridisadon conditions (e.g. as described by 
Sambrook a al.. Molecular Cloning. A Laboratory Manual. CSH, i.e. washing with 
O.txSSC, 0.5% SDS at 68*C) with the sequence shown in Figure 4. Figure 10 shows a 
functionally equivalent sequence designfltrd "125 + 94" , which includes a region 
corresponding to the 3* coding portion of the sequence in Figure 4. Figure 13 shows a 
functionally equivalent sequence which comprises a second complete SBE coding sequence 
(the SBErderived sequence is from nucleotides 35 to 7760, of which the coding sequence 
is nucleotides 131-2677, the rest of the sequence in the figure is vector-derived). 
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Functionally equivalent DNA sequences will preferably comprise at least 2OO-3O0bp, more 
preferably 300-©00bp f ami will exhibit at least 88% identity (more preferably at least 90ft, 
and most preferably al least 95% identity) with the corresponding region of the DNA 
sequence shown in figures 4 or 10. Those skilled in the an will readily be able to conduct 
a sequence alignment between the putative functionally equivalent sequence and those 
detailed, in Figures 4 or 10 - the identity of the two sequences is to be compared in those 
regions which are aligned by. standard computer software, which aligns corresponding 
regions of the sequences. 

In particular embodiments the nucleic acid sequence may alternatively comprise a S* 
and/or a 3' untranslated region ("UTR"). examples of which arc shown in Figures 2 and 
4. Figure 0 includes a 3' UTR. as nucleotides 688-1044 and Figure 10 includes 3* UTR 
as nucleotides 1507-1900 (which nucleotides correspond to the first base after the "stop" 
codon to the base immediately preceding the poly (A) tail). Any one of the sequences 
defined above, or a functional equivalent thereof (as defined by hybridisation properties, 
as set out in the preceding paragraph), could be useful in sense or anti-sense inhibition of 
corresponding genes, as will be apparent to those skilled in the art. It will also be 
apparent to those skilled in the art that such regions may be modified so as to optimise 
expression in a particular type of host cell and that the 5' and/or 3* UTRs could be used 
in isolation, or in combination with a coding portion of the sequence of the invention. 
Similarly, a coding portion could be used without a 5* or a 3' UTR if desired. 

In a further aspect, the invention provides a replicable nucleic acid construct comprising 
any one of the nucleic acid sequences defined above. The construct will typically 
comprise a selectable marker and may allow for expression of the nucleic acid sequence 
of the invention. Conveniently the vector will comprise a promoter (especially a promoter 
sequence operable in a plant and/or a promoter operable in a bacterial cell) and one or 
more regulatory signals known to those skilled in the art. 

In another aspect the invention provides a polypeptide having SBE activity, the polypeptide 
comprising an effective portion of the amino acid sequence shown in Figure 4 or Figure 
13. The polypeptide is conveniently one obtainable from cassava, although it may be 
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derived using recombinant DNA techniques. The polypeptide is preferably in substantial 
isolation from other polypeptides of plant origin, and more preferably in substantial 
isolation from any other polypeptides. The polypeptide may have amino acid residues 
NSKH at about position 697 (in the sequence shown in Figure 4). instead of the sequence 
DA D/E Y found in other SBE class A polypeptides. The polypeptide may be used in a 
method pf modifying starch in vitro, the method comprising treating starch under suitable 
conditions (of temperature. pH etc.) with an effective amount of the polypeptide. 

Those skilled in the art will appreciate that the disclosure of the present specification can 
be utilised in a number of ways. In particular, the characteristics of a host cell may be 
altered by recombinant DNA techniques. Thus, in a further aspect, there is provided a 
method by which a host cell may be altered by introduction of a nucleic acid sequence 
comprising at least 2Wbp and exhibiting at least 88% sequence identity (more preferably 
at least 90%, and most preferably at least 95& identity) with the corresponding region of 
the DNA sequence shown in Figures 4 f 9. 10 or 13. operably linked in the sense or 
(preferably) in the anti-sense orientation to a suitable promoter active in the host cell, and 
causing transcription of the introduced nucleic acid sequence, said transcript and/or the 
translation product thereof being sufficient to interfere with the expression of a 
homologous gene naturally present in said host cell, which homologous gene encodes a 
polypeptide having SBE activity. The altered host cell is typically a plant cell, such as a 
cell of a cassava, banana, potato, sweet potato, tomato, pea. wheat, barley, oat. maize, 
or rice plant. 

Desirably the method further comprises the introduction of one or more nucleic acid 
sequences which are effective in interfering with the expression of other homologous gene 
or genes naturally present in the host cell. Such other genes whose expression is inhibited 
may he involved in starch biosynthesis (e.g. -an SBE I gene), or may be unrelated to SBE 
II. 

Those skilled in the art will be aware that both anti-sense inhibition, and "sense 
suppression" of expression of genes, especially plant genes, has been demonstrated (e.g. 
Matzkc & Matzke 1995 Plant Physiol. IQZr 679-685). 
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It is believed that antisense methods arc mainly operable by the production of antisense 
raRNA which hybridises to the sense mRNA. preventing its translation into functional 
polypeptide, possibly by causing the hybrid RNA to be degraded (e.g. Sheeny et aL, 1988 
PNAS 85, 8805-881)9; Van der Krol et a!., Mul. Gen. Genet. 220. 204-212). Sense 
suppression also requires homology between the introduced sequence and the target gene, 
but the .exact mechanism is unclear. It is apparent however that, in relation to both 
antisense and sense suppression, neither a full length nucleotide sequence, nor a "native" 
sequence is essential. Preferably the nucleic acid sequence used in the method will 
comprise ni least 200-300bp. more preferably at least 3(>0-600bp, of the full length 
sequence, but by simple trial and error other fragments (smaller or larger) may be found 
which are functional in altering the characteristics of the plant. It is also known that 
untranslated portions of sequence can suffice to inhibit expression of the homologous gene 
- coding portions may be'present within the introduced sequence, but they do not appear 
to be essentia] under all circumstances. 

The inventors have discovered that there are at least two class A SBE genes in cassava. 
A fragment of a second gene has been isolated, which fragment directs the expression of 
the C terminal 481 amino acids of cassava class A SBE (see Figure 10) and comprises a 
3* untranslated region. Subsequently, a complete clone of -the second gene was also 
recovered (sec Figure 12). The coding portions of the two genes show some slight 
differences, and the second SBE gene may be considered as functionally equivalent to the 
corresponding portion of the nucleotide sequence shown in Figure 4. However, the 3* 
untranslated regions of the two genes show marked differences. Thus the method of 
altering a host cell may comprise the use of a sufficient portion of either gene so as to 
inhibit the expression of the naturally occurring homologous genu. Conveniently, a 
portion of nucleotide sequence is employed which is conserved between both genes. 
Alternatively, sufficient portions of both genes may be employed; typically using a single 
construct to direct the transcription of both introduced sequences. 

In addition, as explained above, it may be desired to cause inhibition of expression of the 
class B SBE (i.e. SBE I) in the same host cell. A number of doss B SBE gene sequences 
are known, including portions of the cassava class B SBE (Salehuzzaman «f al. t 1994 
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Plant Science 98, 53*62) and any one of these may prove suitable. Preferably the 
sequence used is thai which derives from the host cell sought to be altered (e.g. when 
altering the characteristics of a cassava pi am cell, it is generally preferred to use sense or 
anti-sense sequences corresponding exactly to at least portions of the cassava gene whose 
expression is sought to be inhibited). 

In a further aspect the invention provides an altered host cell, into which has been 
introduced u nucleic acid sequence comprising at least 200bp and exhibiting at least 88% 
sequence identity (more preferably at least 9X)%. and most preferably at least 95% identity) 
with the corresponding region of the DNA sequence shown in Figures 4, v t 10 or 13, 
operably linked in the sense or anti-sense orientation to a suitable promoter, said host cell 
comprising a natural gene sharing sequence homology with the introduced sequence. 

The host cell may be a micro-organism (such as a bacterial, fungal or yeast cell) or a plant 
cell. Conveniently the host cell altered by the method is a cell of a cassava plant, or 
another plant with starch storage reserves, such as banana, potato, sweet potato, tomato, 
pea, wheat, barley, oat, maize, or rice plant. Typically the sequence will he introduced 
in a nucleic acid construct, by way of transformation, transduction, micro-injection or 
other method known to those skilled in the art. The invention also provides for a plant 
into which has been introduced a nucleic acid sequence of the invention, or the progeny 
of such a plant. 

The altered plant cell will preferably be grown into an altered plant, using techniques of 
plant growth and cultivation well-known to those skilled in the art of re-generating 
pfcmUets from plant cells. 

The invention also provides a method of obtaining starch from an altered plant, the plant 
being obtained by the method defined above. Starch may be extracted from the plant by 
any of the known techniques (e.g. milling). The invention further provides starch 
obtainable from a plant altered by the method defined above, the starch having altered 
properties compared to starch extracted from an equivalent but unaltered plant. 
Conveniently the altered starch is obtained from an altered plant selected from the group 
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consisting of cassava, potato, pea. tomato, maize, wheat, barley, oat, sweet potato ami 
rice. Typically the altered starch will have increased amy lose content. 

The invention will now be further described by way of illustrative examples and with 
reference to the accompanying drawings, in which:- 

Figure 1 is a schematic illustration of the cloning strategy for cassava SBE II. The top line 
represents the size of a full length clone with distances in kilobases (kb> and arrows 
representing oligonucleotides (rightward pointing arrows are sense strand, leftward are on 
opposite strand). The long thick arrow is the open reading frame with start and stop 
codons shown. Below this are shown the 3* RACE. 5* RACE and PCR clones identified 
either by the plasmid name (shown in brackets above the line) or the clone number (shown 
to the left of the clone) for the 5* RACE only. Also shown (by an x) in the 5' RACE 
clones are positions of small deletions or introns. 

Figure 2 shows the DNA sequence and predicted ORF of csbe2con.seq. This sequence 
is a consensus of 3* RACE p&J94 and 5' RACE clones 27/9,11 and 2ft. The first 64 base 
pairs are derived from :he RoRidT17 adaptor primer/dT tail followed by the SBE 
sequence. The one long open reading frame is shown in one letter code below the double 
strand DNA sequence. Also shown is the upstream ORF (MQL...LPW). 

Figure 3 shows an alignment of the 5' region of cassava SBE 11 cshc2con and pSJ99 
(clones 20 and 35) DNA sequences. Differences from the consensus sequence are shaded. 

Figure 4 shows the DNA sequence and predicted ORF of full length cassava SBE II tuber 
cDNA in pSJ107. The sequence shown is from the CSBE214 to the CSBE218 
oligonucleotide. The DNA .sequence is sequence ID No. 28 in the attached sequence 
listing; the amino acid sequence is Seq ID No. 29. 

Figure 5 shows an alignment of 3' region of cassava SBE II pSJllft and 125+94 DNA 
sequences. The top line is the 125 + 94 sequence and the bottom SJ116 sequence. 
Identical nucleotides are indicated by the same letter in the middle line, differences are 
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indicated by a gap. and dashed lines indicate gaps introduced to optimise alignment. 



Figure 6 shows an alignment of carboxy terminal region of pSJ116 and 125+94 protein 
sequences. The tup sequence- is from 125+94 and the bottom from pSJH6. Identical 
amino ackl residues are shown with the same letter, conserved changes .with a colon and 
neutral changes with a period. 

Figure 7 shows a phytogenetic tree of starch branching enzyme proteins. The length of 
each pair of branches represents the distance between sequence pairs. The scale beneath 
the tree measures the distance between sequences (units indicate the number of substitution 
events). Dotted lines indicate a negative branch length because of averaging the tree. 
Zmconl2.pro is maize SBE II. psstbl.pro is pea S8E I (Bhatiacharyya et at 1990 Cell 60, 
115-121) and atsbel-l & 2-2. pro are two SBE II proteins from Arabidopsis thaiania 
(Fisher et at 1996 Plant Mol. Biol. 30, 97-108). SJ 107. pro is representative of a cassava 
SBE II sequence, and potsbe2.pro is a potato SBE II sequence known to the inventors. 

Figure 8 is an alignment of SBE II proteins. Protein sequences are indicated in one letter 
code. The top line represents the consensus sequence, below which is shown the 
consensus ruler and the individual SBE II sequences. Residues matching the consensus 
are shaded. Dashes represent gaps introduced to optimise alignment. Sequence identities 
are shown at the right of- the figure and are as Figure 7. except that SJ 107. pro is cassava 
SBE II. 

Figure 9 shows the DNA sequence and predicted ORF of a cassava SBE II cDNA isolated 
by 3* RACE (plasmid pSJ 101). 

Figure 10 shows the consensus DNA sequence and predicted ORF of a second cassava 
SBE II cDNA isolated by 3' and 5* RACE (sequence designated 125+94 is from plasmid 
pSJl25 and pSJ94, spliced at the CSBE217 otigo sequence). 

Figure 1 1 is a schematic diagram of the plant transformation vector pSJ64. The black tine 
represents the DNA sequence. The hashed line represents the bacterial plasmid backbone 
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(containing ibe origin of replication and bacterial selection marker) and is not shown in 
full. The filled triangles represent the T-DNA borders (RB =» right border. LB = left 
tn>rder). Relevant restriction enzyme sites are shown above the black line with the 
approximate distances (in kiloobases) betwen sites marked by an asterisk shown 
underneath. The thinnest arrows represent polyadenylaiion signals (pAnos « nopaiine 
synthase., pAg7 = Agrobacterium gene 7), the intermediate arrows represent protein 
coding regions (SBE II *■ cassava SBE II. HYG = hygromycin resistance gene) and the 
thick arrows represent promoter regions (P-2x35S ■ double CaMV 35S promoter, P-nos 
= nopaiine synthase promoter). 

Figure 12 is a schematic illustration of the cloning strategy used to isolate a second 
cassava SBE II gene. The top line represents the si2e of a_full length clone with distances 
in kilobases (kb) and arrows representing oligonucleotides (righrward pointing arrows are 
sense strand, leftward are on opposite strand). The long thick arrow Is the open reading 
frame with start and stop codons shown. Below this are shown the 3* RACE, 5* RACE 
and PCR clones identified either by the plasm id name (shown in brackets above the line) 
or the clone number (shown to the right of the ckme>. 

Figure 13 shows the DNA sequence and predicted ORF of a second full length cassava 
SBE II ruber cONA in pSJl46. Nucleotides 35-2760 are SBE It sequence and the 
remainder are from the pT7Blue vector. The DNA sequence of Figure 13 is Seq ID No. 
30, and the amino acid sequence is Seq ID No. 31, in the attached sequence listing. 

Example 1 

This example relates to the isolation and cloning of SBE II sequences from cassava. - 
Recombinant DNA mnm pal minus 

Standard procedures were performed essentially according to Sambrook ei at. (198° 
Molecular cloning A laboratory manual, 2nd edn. Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor. N.Y.). DNA sequencing was performed on an AB1 automated DNA 
sequencer and sequences manipulated using DNASTAR software for the Macintosh. 



SUBSTITUTE SHEET (RULE 2Q 



, W09W28U3 



FCT/GB97/B3632 



12 

Rapid Amplification of cDNA ends (RACE) and PCR conditions 

S* and 3* RACE were performed essentially according to Frohman et a/., (1988 Prcc. 

Natl. Acad. Sci. USA 85. 8998-9002) but with the following modifications. 

For 3' RACE, 5 fig of total RNA was reverse transcribed using 5 pmol of the RACE 
adaptor. RoRidTH as primer and Stratascript RNAse H- reverse transcriptase (SO U) in 
a 50 ft I reaction according to the manufacturer's instructions (Stratagene). The reaction 
was incubated for 1 hour at 37°C and then diluted to 200 p\ with TE (10 rriM Tris HO, 
1 mM EDTA) pH 8 and stored at 4°C. 2.5 ai of this cDNA was used in a 25 /d PCR 
reaction with 12.5 pmol of S8E A and Ro primers for 30 cycles of 94"C 45 sec, 50°C 
25 sec, 72° C 1 min 30 sec. A second round of PCR (25 cycles) was performed using I 
/i I of this reaction as template in a 50 pft reaction under the same conditions. Amplified 
products were separated by agarose gel electrophoresis and cloned into the pT7£Wue vector 
(Invitrogen). 

_ For the first round of 5* RACE, 5 /ig of total leaf RNA was reverse transcribed as 
described above using 10 pmol of the SBE II gene specific primer CSBE22. This primer 
was removed from the reaction by diluting to 500 fi\ with TE and centrifuging twice 
through a cemricon 100 microconcentrator. The concentrated cDNA was then dA*tailed 
with 9U of terminal deoxynucleotide transferase and ^50 /iM dATP in a 20 fi\ reaction in 
buf fer supplied by the manufacturer (BRL). The reaction was incubated for 10 min at 
37*C and 5 min at 65 *C and then diluted hi 200 p\ with TE pH 8. PCR was performed 

. in a 50 /(I volume using 5/il of tailed cDNA, 2.5 pmol of RoRidTl7 and 25 pmol of Ro 
and CSBE24 primers for 30 cycles of 94°C 45 sec. 55 °C 25 sec, 72'C 3 min. Amplified 
products were separated- on a \% TAE agarose ge), cut out, 200^1 of TE was added and 
melied at 99°C for 10 min. Five p\ of -this was re-amplified in a 50 p\ volume using 
CSBE25 and Ri as primers and 25 cycles of 94 °C 45 sec. 55*C 25 sec, 72»C 1 rain 30 
sec. Amplified fragments were separated on a 1ft TAE agarose gel, purified on DEAE 
paper and cloned into pT7Bluc. 

The second round of 5' RACE was performed using CSBE28 and 29 primers in the first 
and second round PCR reactions respectively using a new A-tafled SDNA library primed 
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with CSBE27. 

A third round of 5' RACE was performed on the same CSBE27 primed cDNA . 
Repeat 3' RACE and PCR Cloning 

The 3,' RACE library (RoRidT I 7 primed leaf RNA) was used as a template. The first PCR 
reaction was diluted 1:20 and 1 u\ was used in a 50 /d PCR reaction with SBE A and Ri 
primers and the products were cloned into pT7Blue. The cloned PCR products were 
screened for the presence or absence of the CSBE23 oligo by colony PCR. 

A full length cDNA of cassava SBE II was isolated by PCR from leaf or root cDNA 
(Ri>RtdT17 primed) using primers CSBE214 and CSBE218 from 2.5 //I of cDNA in a 25 
/il reaction and 30 cycles of 94°C 45 sec. 55°C 25 see, 72°C 2 min. 

Complementation of E. coii mutant KV832 

SBE II containing plasmids were transformed into the branching enzyme deficient mutant 
£. coli KV832 (Keil * a/., 1987 Mol. Gen. Genet. 207, 294-301) and cells grown on 
solid PYG media (0.85 % KH s P0 4f 1.1 % &HPO,, 0.6 % yeast extract) containing 1.0 
% glucose. To test for complementation, a loop of celts was scraped off and resuspended 
in 150 ph water to which was added 15 //L of Lugol's solution (2 g KI and I g l 2 per 300 
ml water). 

RNA isolation 

RNA was isolated from cassava plants by the method of Logemann (1987 Anal. Biochem. 
163, 21-26). Uaf RNA was isolated from 0.5 gm of in vitro grown plant tissue. The 
total yield was 300 /ig. Three month old roots (88 gm) were used for isolation of root 
RNA). 

SBg II specific nliwnnnel«itiu>« 

SBE A ATGGACAAGGATATGTATGA (Seq ID No. I) 

CSBE21 OGTTTCATGACTTCTGAGCA (Seq ID No. 2 ) 
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CSBE22 TGCTC AG AAGTCATG AAACC (Scq ID No. 3) 

CSBE23 TCCAGTCTCAATATACCTCG (Seq ID No. 4) 

CSBE24 AGGAGTAGATGGTCTGTCGA (Scq ID No. 5) 

CSBE25 TCATACATATCCTTGTCCAT (Seq ID No. 6) 

CSBE26 GOGTOACTTCAATGATGTAC (Seq ID No. 7) 

CSBE?7 GGTGTACATCATTGAAGTCA (Scq ID No. 8) 

CSBE28 AATTACTGGCTCCGTACTAC (Scq ID No. 9) 

CSBE29 CATTCCAACGTGCGACTCAT (Scq ID No. 10) 

CSBE210 TACCGGTAATCTAGGTGTTG (Seq ID No. 11) 

CSBE2 1 1 GGACCTTGGTrr AGATCCAA (Seq ID No. 12) 

CSBE212 ATG AGTCGCACGTTGG AATG (Seq ID No. 13) 

CSBE213 CAACACCTAGATTACCGGTA (Seq ID No. 14) 

CSBE2I4 TTACTTGCGTCAGTTCTCAC (Seq ID No. 15) 

CSBE215 AATATCTATCTCAGCCGGAG (Seq ID No. 16) 

CSBE216 ATCTTAGATAGTCTGCATCA (Seq ID No. 17) 

CSBE217 TGGTTGTTCCCTGGAATTAC (Seq ID No. 18) 

CSBE218 TGCAAGGACCGTGACATCAA (Scq ID No. 19) 



RESULTS 

Cloning ttf a SBE 11 gimc ftnm gasnva | gfl f 

The strategy top. cloning a full length cDNA of starch branching enzyme II of cassava is 
shown in Figure 1. A comparison of several SBE II (class A) SBE DNA sequences 
identified a 23 bp region which appears to be completely conserved among most genes 
(data not shown) and is positioned about one ttlobase upstream from the 3* end of the 
gene. An oligonucleotide primer (designated SBE A) was made to this sequence and used 
to isolate a partial cDNA clone by 3* RACE PCR from first strand leaf cDNA as 
illustrated in Figure 1. An approximately 1100 bp band was amplified, cloned into 
pT7Blue vector and sequenced. This clone was designated pSJ94 and contained a 1 120 
bp insert starting with the SBE A oltgo and ending with a polyA tail. There was a 
predicted open reading frame of 235 amino acids which was highly homologous (79$ 
identical) to a potato SBE II also isolated by the inventors (data not shown) suggesting that 
this clone represented a class A (SBE U) gene. ~~" 
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To obtain the sequence of a full length clone nested primers were made complementary 
to the 5* end of this sequence and used in 5' RACE PCR to isolate clones from the 5' 
region of the, gene. A total of three rounds of 5* RACE was needed to determine the 
sequence of the complete gene (i.e. one that has a predicted long ORF preceded by stop 
codons). It should be noted that during this cloning process several clones (# 23, 9, 16) 
were obtained that had small deletions and in one case (clone 23) there was also a small 
(120 bp) intron present. These occurrences are not uncommon and probably arise through 
errors in the PCR process and/or reverse transcription of incompletely processed RNA 
(heterogeneous nuclear RNA). 

The overlapping cDNA fragments could be assembled into a contiguous 3 kb sequence 
(designated csbe2con.seq) which contained one long predicted ORF as shown in Figure 
2, Several clones in the last round of 5" RACE were obtained which included sequence 
of the untranslated leader (UTL). All of these clones had an ORF (42 amino acids) 46 bp 
upstream and out of frame with that of the long ORF. 

There is more than one SHE If gyr* *min 

In order to determine if the assembled sequence represented that of a single gene, attempts 
were made to recover by PCR a full length SBE H gene using primers CSBE214 and 
CSBE23 at the 5* and 3* ends of the csbe2con sequence respectively. All attempts were 
unsuccessful using either leaf or root cDN A as template. The PCR was therefore repeated 
with either the 5'- or 3'- most primer and complementary primers along the length of the 
SBE II gene to determine the size of the largest fragment thai could be amplified. With 
the CSBE214 primer* fragments could be amplified using primers 210. 28, 27 and 22 in 
order of increasing distance, the latter primer pair amplifying a 2.2 kb band. With the 3' 
primer CSBE23, only primer pairs with 21 and 26 gave amplication products, the latter 
being about 1200 bp. These results suggest that the original 3* RACE clone (pSJ94) is 
derived from a different SBE 11 gene than the rest of the 5' RACE clones even though the 
two largest PCR fragments (214+22 and 26+23) overlap by 750 bp and shore several 
primer sites. It is likely that the sequence of the two genes starts to diverge around the 
CSBE22 primer site such thaflhe 3' end of the corresponding gene does not contain the 
23 primer and is not therefore able to amplify a cDNA when used* with the 214 primer. 
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To confirm this, the sequence of the longest 5* PCR fragment (214+22) from two clones 
(#20 designated pSJ99. & #35) was determined and compared to the consensus sequence 
csbe2con as shown in Figure 3. The fust 2000 bases are nearly identical (the single base 
changes might well be PCR errors), however the consensus sequence is significantly 
different after this. This region corresponds to the original 3' RACE fragment pSJ94 
(SBE A + Ri adaptor) and provided evidence that there may be more than one SBE II 
gene in cassava. 

The 3* end corresponding to pSJ°9 was therefore cloned as follows: 3* RACE PCR was 
performed on leaf cDNA using the SBE A oligo as the gene specific primer so that all 
SBE II genes would he amplified. The cloned DNA fragments were then screened for the 
presence or absence nf the CSBE23 primer by PCR. Two out of 15 clones were positive 
with the SBE A + Ri primer pair but negative with SBE A + CSBE23 primers. The 
sequence of these two clones (designated pSJIOl, as shown in Figure 9) demonstrated thai 
they were indeed from an SBE U gene and that they were different from pSJ94. However 
the overlapping region of pSJIOl (the 3' clone) and pSJ99 (the 5* clone) was identical 
suggesting that they were derived from the same gene. 

To confirm this a primer (CSBE218) was made to a region in the 3 f UTR (untranslated 
region) of pSJIOl and used in combination with CSBE214 primer to recover by PCR a full 
length cDNA from both leaf and root cDNA. These clones were sequenced and 
designated pSJl06 & nSJ107 respectively. The sequence and predicted ORF of pSJl07 
is shown in Figure 4. The long ORF in plasmid pSJ106 was found to be interrupted by 
a stop codon (presumably introduced in the PCR process) approximately 1 kfo from the 3* 
end of the gene, therefore another cDNA clone (designated pSJ116) was amplified in a 
separate reaction, cloned and sequenced. This clone had an intact ORF (data not shown). 
There were only a few differences in these two sequences (in the transit peptide aa 27- 41: 
YRRTSSCLSFNFKEA to ORRTSSCLSFIFKKAA and L831 in pSJtO? to V in pSJ116 
respectively). 

An additional 740bp nf sequence of the gene corresponding to the pSI94 clone was 
isolated by 5' RACE using the primers CSBE216 and 217, and was flSfrignn*^ pSJl25. 
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This sequence was combined with thai of pSJ94 to form a consensus sequence "125 + 
94", as shown in Figure 10. The sequence of this second gens is about 90% identical at 
the DNA and protein level to pSJlln. as shown in Figure 5 and 6, and is dearly a second 
form of SBE IMn cassava. The 3* untranslated regions of the two genes are not related 
(data not shown). 

It was also determined that the full length cassava SBE II genes (from both leaf and tuber) 
actually encode for active starch branching enzymes since the cloned genes were able to 
complement the glycogen branching enzyme deficient E. coti mutant KV832. 

Main Findings 

1) A full length cDNA clone of a starch branching enzyme 11 (SBE II) gene has been 
cloned from leaves and starch storing roots of cassava. This cDNA encodes a 836 amino 
acid protein (Mr 95 Kd) and is 86 % identical to pea SBE I over the central conserved 
domain, although the level of sequence identity over the entire coding region is lower than 
86%. 

2) There is more than one SBE II gene in cassava as a second partial SBE II cDNA was 
isolated which differs slightly in the protein coding region from the first gene and has no 
homology in the 3* untranslated region. 

3) The isolated full kngth cDNA from both leaves and roots encodes an active SBE as 
it complements an E. cofi mutant deficient in glycogen branching enzyme as assayed by 
iodine staining. 

We hove shown that there are SBE II (Class A) gene sequences present in the cassava 
genome by isolating cDNA fragments using 3' and 5' RACE. From these cDNA 
fragments a consensus sequence of over 3 kb could bo compiled which contained one long 
open reading frame (Figure 2) which is highly homologous to other SBE II (class A) genes 
(data not shown). Ii is likely thai the consensus sequence does not represent that of a 
single gene since attempts to PCR a full length gene using primers at the 5* and 3* ends 
of this sequence were not successful. In fact screening "of a number of leaf derived 3* 
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RACE cDNAs showed ihai a second SBE 11 gene (clone designated pSJIOl) was also 
expressed which is highly homologous within the coding region to the originally isolated 
cDNA (pSJ94) but has a different 3' UTH A full length SBE II gene was isolated from 
leaves and roots by PCR using a new primer to the 3* end of this sequence and the 
original sequence at the 5' end of the consensus sequence. If the frequency of clones 
isolated by 3* RACE PCR reflects the abundance of the mRNA levels then this full length 
gene may be expressed at lower levels in the leaf than the pSJ94 clone (2 out of 15 were 
the former class, 13/15 the latter). It should be noted that each class is expressed in both 
leaves and roots as judged by PCR (data not shown). Sequence analysis of the predicted 
ORF of the leaf and root genes showed only a few differences (4 amino acid changes and 
one deletion) which could have arisen through PCR errors or, alternatively, there may be 
more than one nearly identical gene expressed in these tissues. 

A comparison of all known SBE II protein sequences shows that the cassava SBE II gene 
is most closely related to the pea gene (Figure 8). The two proteins are 66.3% identical 
over a 686 amino acid range which extends from the triple proline "elbow" (Burton «r a/., 
1995 Plant J. 7, 3-15) to the conserved WYA sequence immediately preceding the C- 
terminal extensions (data not shown). All SBE II proteins are conserved over this range 
in that they are at least 80% similar to each other. Remarkably however, the sequence 
conservation between the pea. potato and cassava SBE I! proteins also extends to the N- 
terrainat transit peptide, especially the first 12 amino acids of the precursor protein and 
the region surrounding the mature terminus of the pea protein (AKFSRDS). Because the 
proteins are so similar around this region it can be predicted that the mature terminus of 
the cassava SBE II protein is likely to be GKSSHES. The precursor has a predicted 
molecular mass of 96 kD and the mature protein a predicted molecule mass of 91.3 kD. 
The cassava SBE II has a short acidic tail at the C-termirtal although this is not as long or 
as acidic as that found in the pea or potato proteins. The significance nf this acidic tail, 
if any t remains to be determined. One notable difference between the amino acid 
sequence of cassava SBE II and nil other SBE II proteins is the presence of the sequence 
NSKH at around position 697 instead of the conserved sequence DAD/EY. Although this 
conserved region forms part of a predicted «*helix (number 8) of the catalytic (BAife barrel 
domain (Burton et ai 1995 cited previously), this difference does not abolish the SBE 
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activity of the cassava protein as -this gene can still complement the glycogen branching 
deletion mutant of £ cob*. It may however affect the specificity of the protein. An 
interesting point is that the other cassava SBE U clone pSJ94 has the conserved sequence 
DADY. 



One other point of interest concerning the sequence of the SBE II gene is the presence of 
an upstream ATG in the 5' UTR. This ATC could initiate a small peptide of 42 amino 
acids which would terminate downstream of the predicted initiating methionine codon of 
the SBE II precursor. If this does occur then the translation of the SBE II protein from this 
mRNA is likely to be inefficient as ribosomes normally initiate at the 5* most ATG In the 
mRNA. However the first ATG is in a poorer Kozak context than the SBE II initiator and 
it may be too close to the 5' end of the message to initiate efficiently (14 nucleotides) thus 
allowing initiation to occur at the correct ATG. 

In conclusion we have shown that cassava does have SBE II gene sequences, that they are 
expressed in both leaves and tubers and that more than .one gene exists. 

Example 2 

Chute of a Kfflid fall kntth mmvn SBfi II tm 



Methods 
Oligonucleotides 

CSBE219 CTTTATCTATTAAAGACTTC (Seq ID No. 20) 

CSBE220 CAAAAAAGTTTGTGACATGG (Seq ID No. 21) 

CSBE221 TCACTTTTTCCAATGCTAAT (Seq ID No. 22) 

CSBE222 TCTCATGCAATGGAACCGAC (Seq ID No. 23) 

CSBE223 CAGATGTCCTGACTCGGAAT (Seq ID No. 24) 

CSBE224 ATTCCGAGTCAOGACATCTG (Seq ID No. 25) 

CSBE225 OGCATTTCTCGCTATTGCTT (Seq ID No. 26) 

CSBE226 CACAGGCCCAAGTGAAGAAT (Seq ID No. 27) 

The 5* end of the gene corresponding to the 3' RACE clone pSJ94 waslsolated in three 
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rounds of S'RACE. Prior to performing the first round of 5' RACE, 5 pg of total leaf 
RNA was reverse transcribed in a 20 /U reaction using conditions as ascribed by the 
manufacturer (Superscript enzyme. BRL) and 10 pmol of the SBE II gene specific primer 
CSBE23. Primers were then removed and the cDNA tailed with dATP as described 
above. The first round of 5 'RACE used primers CSBE216 and Ro. This PCR reaction 
was dfluted 1 :20 and used as a template for a second round of amplification using primers 
CSBE217 and Ri. The gene specific primers were designed so that they would 
preferentially hybridise to the SBE II sequence in pSJ94. Amplified products appeared 
as a smear of approximately 600-1200 bp when subjected to electrophoresis on a 1 % TAE 
agarose gel. 

This smear was excised and DNA purified using a Qiaquick column (Qiagen) before 
ligation to the pT7Blue vector. Several clones were sequenced and clone #7 was 
designated pSJ125. New primers (CSBE219 and 220) were designed to hybridise to the 
5* end of pSJ125 and a second round of 5* RACE was performed using the same CSBE23 
primed library. Two fragments of 600 and 800 bp were cloned and sequenced (clones 
13,17). Primers CSBE221 and 222 were designed to hybridise to the 5' sequence of the 
longest clone (#13) and a third round of 5' RACE was performed on a new library (5 ug 
total leaf RNA reverse transcribed with Superscript using CSBE220 as primer and then 
dATP tailed with TdT from Boehringer Mannheim). Fragments of approximately 500 bp 
were amplified, cloned and sequenced. Clone 813, was designated pSJ143. The process 
is illustrated schematically in Figure 12. 

To isolate a full length gene as a contiguous sequence, a new primer (CSBE225) was 
designed to hybridise to the 5* end of clone pSJ143 and used with one of the primers 
(CSBE226 or 23) in the 3' end of clone pSJ94, in a PCR reaction using RoRidTl 7 primed 
leaf cDNA as template. Use of primer CSBE226 resulted in production of Clone #2 
(designated pSJ144), and use of primer CSBE23 resulted in production of Clones #10 and 
13 (designated pSJ!45 and pSJ146 respectively). Only pSJ146 was sequenced fully. 

Results 

farilation of a second full lmrth cassava SBE II gene 
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A full length clone for a second SBE II gene was isolated by extending the sequence of 
P&J04 in three rounds of 5* RACE as illustrated schematically in Figure 12. In each 
round of 5' RACE, primers were designed that would preferentially hybridise to the new 
sequence rather than to the gene represented by pSJl 16. In the final round of 5* RACE, 
three clones were obtained that had the initiating raethione codon, and none of these had 
upstream ATGs. The overlapping cDNA fragments (sequences of the 5'RACE clones 
pSJ143, 13. pSJl25 and the 3' RACE clone pSJ°4) could be assembled into a consensus 
sequence of approximately 3 leb which was designated csbe2-2.seq. This sequence 
contained one long ORF with a predicted size of 848 aa (M, 97 kDa). The full length 
gene was then isolated as a contiguous sequence by PCR amplification from RoRidT17 
primed leaf cDNA using primers at the 5' (CSBE225) and 3* (CSBE23 or CSBF77.fi) ends 
of the RACE clones. One clone, designated pSJ146, was sequenced and the restriction 
map is shown along with' the predicted amino acid sequence in Figure 13. 

Smotwct twmfttattt brtffeen SBE H graes 

The two cassava genes (pSJU6 and pSJ146) share 88.8% identity at the DNA level over 
the entire coding region (data not shown). The homology extends about 50 bases outside 
of this region but beyond this the untranslated regions show no similarity (data not 
shown). At the protein level die two genes show 86% identity over the entire ORF (data 
not shown). The two genes are more closely related to each other than to any other SBE 
n. Between species, the pea SBE I shows the most homology to the cassava SBE II 
genes. 



Example 3 

CoMtrartton of nhnt tr»n*fnrm»tion wtttora *m* transfer— fay) fff CTBfff" 8 whh 

aattKPK gtircb bruwbiag cmrmc mxu 

This example describes in detail how a portion of the SBE II gene isolated from cassava 
may be introduced into cassava plants to create transgenic plants with altered properties. 

An 1100 bp /And III - Sac I fragment of cassava SBE II (from plasm id pSJ94) was cloned 
into the Hind III - Sac I sites of the plant transformation vector pSJ64 (Figure U). This 
placed the SBE II gene in an ami sense orientation between the 2X 35S CaMV promoter 
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and the nopaline synthase potyadenylatfon signal. PSJ64 is a derivative o the 
binary vector pGPTV-HYG (Becker et aL, 1 992 Plant Molecular Biology 20: 
1 1 95-1 1 97) modified by inclusion of an approximately 750 bp fragment of 
pJIT60 (Guerineau et al 1992 Rant MoL Biol. 18, 815418) containing the 
duplicated caufiflower mosaic virus (CaMV) 35S promoter (Cabb-J I strain, 
equivalent to nucleotides 7040 to 7376 duplicated upstream of 7040 to 7433, as 
described by Frank et a)., 1 980 CeH 21 , 285-294) to replace the GUS coding 
sequence. A similar constnjct was mactewffo 
ptasmkJ pSJ101. 

These plasmids are then introdu^ 

a direct DNA uptake method (An et al. Binary vectors, In: Plant Molecular Biology 
Manual (ed embryos by selecting on hygromycfn as described by U et el (1996, 
Nature Biotechnology 14, 738-740). 

The term tcmpriser and gramma^ variattons thereof 

used in the description or claims does notpreciude the presence clad^ 

features, integers, steps or components. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(1) APPLICANT: 

(A) NAME: National Starch and Chemical Investment 

Holding Corporation 

(B) STREET: Suite 27. 501 Sllverslde Road 

(C) CITY: Wilmington 

(D) STATE: Delaware 
(E> COUNTRY: USA 

(F) POSTAL CODE (ZIP): 19809 

(ii) TITLE OF INVENTION: Improvements in or Relating to Starch 
Content of Plants 

. (111) NUMBER OF SEQUENCES: 31 

(1v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS/MS -DOS 

(D) SOFTWARE: Patentln Release #1.0. Version #1.30 (EPO) 

(2) INFORMATION FOR SEQ 10 NO: 1: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH. 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEONESS: single 

(D) TOPOLOGY: linear 

(xi> SEQUENCE DESCRIPTION : SEQ' ID NO: 1: 
ATGGACAAGG ATATGTATGA 20 

(2) INFORMATION FOR SEQ 10 tiO: 2: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEONESS: single . 
(0) TOPOLOGY: linear 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

GGTTTCATCA CTTCTGAGCA 20 

(2) INFORMATION FOR SEQ ID NO: 3: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

<x1) SEQUENCE DESCRIPTION: SEO ID NO: 3: 

TGCTCAGAAG TCATGAAACC 20 

(2) INFORMATION FOR SEQ ID NO: 4: 

(1) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
(6) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(x1) SEQUENCE DESCRIPTION: SEO ID NO: 4: 
TCCAGTCTCA ATATACGTCG 20 

(2) INFORMATION FOR SEQ ID NO: S: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(X1) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
AGGAGTAGAT GGTCTGTCGA 20 

(2) INFORMATION FOR SEQ ID NO: 6: 

(1) SEQUENCE CHARACTERISTICS: 
(A> LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(x1) SEQUENCE DESCRIPTION: SEQ 10 NO: 6: 
TCATACATAT CCTTGTCCAT 20 

(2) INFORMATION FOR SEQ ID NO: 7: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add — 
'(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(x1) SEOUENCE DESCRIPTION: SEO ID NO: 7: 
GQ6TGACTTC AATGATGTAC 20 

(2) INFORMATION FOR SEO ID NO: 8: 

(1) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
. (8) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xl) SEQUENCE DESCRIPTION: SEO ID NO: 8: 
GGTGTACATC ATTGAAGTCA 20 

(2) INFORMATION FOR SEO ID NO: 9: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(x1> SEQUENCE DESCRIPTION: SEQ 10 NO: 9: 
AATTACTGGC TCCGTACTAC 20 

(2) INF0W1ATI0N FOR SEQ ID NO: 10: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 
<D> TOPOLOGY: linear 

(xD SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

CATTCCAACG TGCGACTCAT 20 

(2) INFORMATION FOR SEQ ID NO: 11: ' 

(1) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
<B> TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xD SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TACCGGTAAT CTAGGTGTTG 20 



SUBSTITUTE SHEET (RULE 28) 



- WO 90/2*145 



(2) INFORMATION FOR SEQ ID NO: 12: 

(1) SEQUENCE CHARACTERISTICS: 
_(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEONESS: single 

(D) TOPOLOGY: linear 

(x1) SEQUENCE OESCRIPTION: SEO ID NO: 12: 
GGACCTTGGT TTAGATCCAA 



(2) INFORMATION FOR SEQ 10 NO: 13: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEONESS: single 

(D) TOPOLOGY: linear 

(X1) SEQUENCE OESCRIPTION: SEQ ID NO: 13: 
ATGAGTCGCA CGTTGGAATG 



(2) INFORMATION FOR SEQ 10 NO: 14: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add 

(C) STRANOEONESS: single 
(0) TOPOLOGY: linear 

(xi> SEQUENCE OESCRIPTION: SEO ID NO: 14: 

CAACACCTAG ATTACCGGTA 



(2) INFORMATION FOR SEQ ID NO: 15: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add t . 
<C> STRANOEONESS: single 
(0) TOPOLOGY: linear 

(x1) SEQUENCE OESCRIPTION: SEQ ID NO: 15: 

TTASTTGCGT CAGTTCTCAC 



(2) INFORMATION FOR SEQ ID NO: 16: 

(1) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
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(B) TYPE: nucleic add 

(C) STRANOEDNESS: single* 
(0) TOPOLOGY: linear 

(x1) SEQUENCE DESCRIPTION: SEQ 10 NO: 16: 

AATATCTATC TCAGCCGGAG *~ 20 

(2) 'INFORMATION FOR SEQ 10 NO: 17: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add 

(C) STRANOEDNESS: single 

(D) TOPOLOGY: linear 

(x1) SEQUENCE DESCRIPTION: SEQ 10 NO: 17: 
ATCTTAGATA GTCTGCATCA 20 

(2> INFORMATION FOR SEQ ID NO: 18: 

( 1 ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add 

(C) STRANOEDNESS: single 
(0) TOPOLOGY: linear 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

TGGTTGTTCC CTGGAATTAC 20 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add 

(C) STRANOEDNESS: single 
(0) TOPOLOGY: linear 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

TGCAAGGACC GTGACATCAA 20 

(2) INFORMATION FOR SEQ ID NO: 20: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add 

(C) STRANOEDNESS: single _ 
(0) TOPOLOGY: linear 
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(x1) SEQUENCE DESCRIPTION: SEQ 10 NO: 20: 
CTTTATCTAT TAAAGACTTC 



(2) INFORMATION FOR SEO ID NO: 21: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 21: 

CAAAAAAGTT TGTGACATGG 



(2) INFORMATION FOR SEQ 10 NO: 22: 

(1) SEQUENCE CHARACTERISTICS: ' 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEDNESS: single 
(0) TOPOLOGY: linear 

(x1) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

TCACTTTTTC CAATGCTAAT 



(2) INFORMATION FOR SEO ID NO: 23: 

(1) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
(B> TYPE: nucleic acid 
(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 23: 

TCTCATGCAA TCGAACCGAC 



(2) INFORMATION FOR SEQ 10 NO: 24: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

CAGATGTCCT GACTCGGAAT 
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(2) INFORMATION FOR SEQ ID NO: 25: 

(1) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
XB) TYPE: nucleic add 

(C) STRANOEONESS: single 

(D) TOPOLOGY linear 

<x1) SEQUENCE DESCRIPTION: SEO ID NO: 25: 
A77CCGAGTC AGGACATCTG 20 

(2) INFORMATION FOR SEQ ID NO: 26: 

(.1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add 

(C) STRANOEONESS: single 

(D) TOPOLOGY: linear 

(x1) SEQUENCE DESCRIPTION: SEQ ID NO: 26: ■ 
CGCATTTCTC GCTATTGCTT 20 

(2) INFOWOTION FOR SEQ ID NO: 27: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic add 

(C) STRANOEONESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEO ID NO: 27: 
CACAGGCCCA AGTGAAGAAT 20 

(2) INFORMATION FOR SEQ ID NO: 28: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2588 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : singte 

(D) TOPOLOGY: linear 

(1x) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 21.. 2531 

(xi) SEQUENCE DESCRIPTION: SEQ 10 NO: 28: 

CTCTCTAACT TCTCAGC6AA ATS GGA CAC TAC ACC ATA TCA GGA ATA CGT 50 
.Met Gly His Tyr Thr lie Ser Glyile Arg 
1 5 10 
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ITT CCT TGT GCT CCA CTC TCC AAA TCT CM TCT ACC QGC TTC CAT GGC 
Phe Pro Cys Ala Pro Leu Cys Lys Ser Gin Ser Thr Gly Phe His Gly 
15 20 25 

ACC TCC TCT TCC CTT TCC TTC AAC TTC AAG GAG GCG TTT 
Thr Ser Ser Cys Leu Ser Phe Asn Phe Lys Glu Ala Phe 
30 35 40 

TCT AGG AGG GTC TTC TCT GGA AAG TCA TCT CAT GAA TCT GAC TCC TCA 
Val Phe Ser Gly Lys Ser Ser His Glu Ser Asp Ser Ser 
50 55 



TAT CGG 


AGG 


Tyr Arg 


Arg 


TCT AGG 


AGG 


Ser iArg 




AATGTA 


ATG 


Asn Val 


Met 


60 




GAA TGC 


TAT 


Glu Cys 


Tyr 


75 


TCA GAA 


GAA 


Ser Glu 


Glu 


GAT AAG 


ATT 


Asp Lys 


He 


GAG ACA 


GTT 


Glu Thr 


Val 




125 


CCA CCC 


GGC 


Pro Pro 


Gly 


140 


GGC TTT 


CGT 


Gly Phe 


Arg 


155 


CGAGAA 


GAA 


Arg Glu 


Glu 


GGC TAT 


GAA 


Gly Tyr 


Glu 


AGA GAG 


TGG 


Arg Glu 


2oi 


AAT AAC 


TGG 


Asn Asn 


Trp 


220 



65 70 

TCT TCT TCA ACA GAT CAA TTG GAA GCC CCT GGC ACA GTT 
Ser Ser Ser Thr Asp Gin Leu Glu Ala Pro Gly Thr Val 
80 85 90 

TCC CAG GTG CTT ACT GAT GTT GAG AGT CTC ATT ATG GAT 
Ser Gin Val Leu Thr Asp Val Glu Ser Leu He Met Asp 
95 100 105 

GTT GAA GAT GAA GTA AAT AAA GAA TCT GTT CCA ATG CGG 
Val Glu Asp Glu Val Asn Lys Glu Ser Val Pro Met Arg 
110 115 ... 120 



Arg Ser He Pro 



130 

AGA GGG CAA AGA ATA TAT GAC ATA GAT 

Arg Gly Gin Arg He Tyr Asp lie Asp .... 

145 150 

CAA CAC CTA GAT TAC CGG TAT TCA CAG TAC AAA AGA CTC 
Gin His Leu Asp Tyr Arg Tyr Ser Gin Tyr lys Arg Leu 
. 160 165 170 

ATT GAC AAG TAT GAA GGT AGT CTG GAT GCA TTT TCT CGT 
He Asp Lys Tyr Glu Gly Ser Leu Asp Ala Phe Ser Arg 
175 180 _ 185 

AAG TTT GGT TTC TCA CGC AGT GAA ACA GGA ATA ACT TAT 
L^s Phe Gly Phe Ser Arg Ser Glu Thr Gly lie Thr Tyr 

GCA CCA GGA GCT ACG TGG GCT GCA TTG ATT GGA GAT TTC 
Ala Pro Gly Ala Thr Trp AlaJVla Leu He Gly Asp Phe 
210 215 

AAT CCT AAT GCA GAT GTC ATG ACT CAG AAT GAG TGT GGT 
Asn Pro Asn Ala Asp Val Met Thr Gin Asn Glu Cys Gly 
225 230 
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GTC TQG GAG ATC TTT TTG CCG AAT AAT GCA GAT GGT TCA CCA CCA ATT 770 
Val Trp Glu He Phe Leu Pro Asn Asn Ala Asp Gly Ser Pro Pro He 
235 240 245 250 



CCC CAT GGT TCT CGA GTA AAG ATA CGC ATG GAT ACT CCA TCT GGC AAC 818 
Pro His Gly Ser Arg Val Lys He Arg Met Asp Thr Pro Ser Gly Asn 
255 260 265 

AAA GAT TCT ATT CCT GCT TGG ATC AAG TTC TCA GTT CAA GCA CCA GGT 866 
Lys 'Asp Ser He Pro Ala Trp He Lys Phe Ser Val Gin Ala Pro Gly 
270 275 280 



GAA CTC CCA TAT AAT GGC ATA TAC TAT GAT CCT CCC GAG GAG GAG AAG 914 
Glu Leu Pro Tyr Asn Gly He Tyr Tyr Asp Pro Pro Glu Glu Glu Lys 
285 290 295 



TAT GTG TTC AAA AAT CCT CAG CCA AAG AGA CCA AAA TCA CTT CGG ATT 962 
Tyr Val Phe Lys Asn Pro Gin Pro Lys Arg Pro Lys Ser Leu Arg He 
300 - 305 310 

TAT GAG TCG CAC GTT GGA ATG AGT AGT ACG GAG CCA GTA ATT AAC ACA 1010 
Tyr Glu Ser His Val Gly Het Ser Ser Tbr Glu Pro Val He Asn Thr 
315 320 325 330 



TAT GCC AAC TTT AGA GAT GAT GTG CTT CCT CGC ATC AAA AAG CTT GGC 1058 

Pro Arg He Uys Lys Leu 
340 345 



Tyr Ala Asn Phe Ajg Asp Asp Val Leu Pro Arg He Lys Lys Leu Gly 



TAC AAT GCT GTT CAG CTC ATG GCT ATT CAA GAG CAT TCA TAT TAT GCT 1106 
Tyr Asn Ala Val Gin Leu Met Ala lie Gin Glu His Ser Tyr Tyr Ala 
350 355 360 



AGT TTT GGG TAT CAC GTC- ACA AAC TTT TAT GCA GCT AGC AGC CGA TTT 1154 
Ser Phe Gly Tyr His Val Thr Asn Phe Tyr Ala Ala Ser Ser Arg Phe 
365 . 370 375 

GGA ACT CCT GAT GAT TTA AAG TCT CTA ATA GAT AAA GCT CAC GAG TTA 1202 
Thr 
380 



Gly Thr Pro Asp Asp Leu Lys Ser Leu He Asp lys Ala His Glu Leu 



GGT CTT CTT GTT CTC ATG GAT ATT GTT CAT AGC CAT GCA TCA ACT AAT 1250 
Gly Leu Leu Val Leu Het Asp He Val His Ser His Ala Ser Thr Asn 
395 400 405 410 



ACG TTG GAT GGG CTG AAT ATG TTT GAT GGT ACG GAT GGT CAC TAC TTT 1298 
Thr Leu Asp Gly Leu Asn Met Phe Asp Gly Thr Asp Gly His Tyr Phe 
415 - 420 425 

CAC TCT GGA CCA CGG GGT CAT CAT TGG ATG TGG GAC TCT CGC CTT TTC 1346 
His Ser Gly Pro Arg Gly His His Trg Met Trp Asp Ser Anj"Leu Phe 

AAC TAT GGG AGC TGG GAG GTT CTA AGG TTT CTT CTT TCA AAT GCA AGG 1394 
Asn Tyr Gly Ser Trp Glu Val Leu Arg Phe Leu Leu Ser Asn Ala Arg 
445 450 455 
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TOG TGG TTG GAT GAG TAC AA6 TTT GAT GGG TTC ASA TTT GAT GGG GT6 1442 
Trp Trp Leu Asp 61 u Tyr Lys Phe Asp Gly Phe Arg Phe Asp Gly Val 
460 465 470 

ACT TCA ATG ATG TAC ACC CAT CAT GGA TTG CAG GTA GAT TTT ACC GGC 1490 
Thr Ser Met Met Tyr Thr His His Gly Leu Gin Val Asp Phe Thr Gly 
475 480 485 490 

AAC TAC AAT GAA TAC TTT GGA TAT GCA ACT GAT GTA GAT GCT GTG GTT 1538 
AsmTyr Asn Glu Tyr Phe Gly Tyr Ala Thr Asp Val Asp Ala Val Val 
495 500 505 

TAT TTC ATG CTG TTG AAT 6AT ATG ATT CAT GGT CTC TTC CCA GAG GCT 1586 
Tyr Leu Met Leu Leu Asn Asp Met He His Gly Leu Phe Pro Glu Ala 
510 515 520 

GTC ACC ATT GGT GAA GAT GTT AGT GGA ATG CCA ACA GTT TGC ATT CCG 1634 
Val Thr He Gly Glu Asp Val Ser Gly Met Pro Thr Val Cys He Pro 
525 530 . 535 

GTT GAA GAT GGT GGT GTT GGC TTT GAT TAT CGT CTC CAC ATG GCT GTT 1682 
Val Glu Asp Gly Gly" Val Gly Phe Asp Tyr Arg Leu His Met Ala Val 
540 545 550 

GCT GAT AAA TGG GTT GAG ATT ATT CAG AAG AGA GAT GAA GAT TGG AAA 1730 
Ala Asp Lys Trp Val Glu lie He Gin Lys Arg Asp Glu Asp Trp Lys 
555 560 565 570 

ATG GGT GAC ATT GTA CAT ATG CTG ACC AAC AGG CGG TGG TTG GAA AAG 1778 
Met Gly Asp lie Val His Met Leu Thr Asn Arg Arg Trp Leu Glu Lys 
575 580 585 

TGT GTT TCT TAT GCT GAA AGT CAT GAC CAG GCC CTT GTT GGT GAC AAA 1826 
Cys Val Ser Tyr Ala Glu Ser His Asp Gin Ala Leu Val Gly Asp Lys 
590 595 600 

ACT ATT GCA TTT TGG CTG ATG GAC AAG GAT ATG TAT GAC TTC ATG GCT 1874 
Thr He Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp Phe Met Ala 
605 610 615 



CTT GAC AGA CCA TCT ACT CCT CTC ATA GAT CGT GGA GTA GCA TTG CAC 1922 
Leu Asp Arg Pro Ser Thr Pro Leu He Asp Arg Gly Val Ala Leu His 
620 625 630 

AAA ATG ATC AGG CTT ATT ACC ATG GGA TTA GGC GGA GAA GGA TAT TTG 1970 
Lys Met He Arg Leu He Thr Met Gly Leu Gly Gly Glu Gly Tyr Leu 
&5 640 645 650 

AAT TTT ATG GGA AAT GAA TTT GGA CAC CCC GAG TGG ATT GAT TTT CCA 2018 
Asn Phe Met Gly Asn Glu Phe Gly His Pro 61u Trp He Asp Phe Pro 
655 660 665 

AGA GGT GAT CTA CAT CTT CCC AQT GGT AAA TTT GTT CCT GGG AAC AAT 2066 
Arg Gly Asp Leu His Leu Pro Ser Gly Lys Phe Val Pro Gly Asn Asn 
670 675 680 
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TAC AGT TAT GAT AAA TGC CGG CGT AG6 TTT GAT CTA GGC AAT TCA AA6 2114 

Gly. 
695 



Tyr Ser Tgr Asp Lys Cys Arg Arg Arg Phe Asp Leu Gly Asn Ser Lys 



CAT CTG AGA TAT CAT GGA ATG CAA GAG TTT GAT CAA GCA ATT CAG CAT 2162 
His Leu Arg Tyr His Gly Met Gin Glu Phe Asp Gin Ala lie Gin His 
700 705 710 

CTT GAA GAA GCC TAT GGT TTC ATG ACT TCT GAG CAC CAA TAC ATA TCA 2210 
Leu iGlu Glu Ala Tyr Gly Phe Met Thr Ser Glu His Gin Tyr lie Ser 
715 720 725 730 

CGG AAG GAT GAA AGG GAT CGG ATC ATT GTC TTC GAG AQG GGA AAC CTC 2258 
Arg Lys Asp Glu Arg Asp Arg He lie Val Phe Glu Arg Gly Asn Leu 
735 740 745 

GTT TTT GTA TTC AAT TTT CAT TGG ACT AGC AGC TAT TOG GAT TAC CGA 2306 
Val Phe Val Phe Asn Phe His Trp Thr Ser Ser Tyr Ser Asp Tyr Arg 
750 755 760 



GTT GGC TGC TTA AAG CCA GGA AAG TAC AAG ATA GTC TTG GAT TCA GAT 2354 
Val Gly Cys Leu Lys Pro Gly Lys Tyr Lys He Val Leu Asp S.er Asp 
765 770 775 



GAT CCT TTG TTT GGA GGC TTT GGC AQG CTT AGT CAT GAT GCA GAG CAC 2402 
Asp Pro Leu Phe Gly Gly Phe Gly Arg Leu Ser His Asp Ala Glu His 
780 785 .790 



TTC AGC TTT GAA GGG TGG TAC GAT AAC CGG CCT CGA TCC TTC ATG GTG 2450 
Phe Ser Phe Glu Gly Trp Tyr Asp Asn Arg Pro Arg Ser Phe Met Val 
795 800 805 810 



TAC ACA CCA TGT AGA ACA GCA GTG GTC TAT GCT TTA GTG GAG GAT GAA 2498 
Tyr Thr Pro Cys Arg Thr Ala Val Val T^r Ala Leu Val Glu Asg Glu 

GTG GAG AAT GAA TTG GAA CCT GTC GCC GGT TAA GATATATCTT AACAACAGGT 2551 
Val Glu Asn Glu Leu Glu Pro Val Ala Gly * 
830 . 835 

TCTGAAGCAG GAATGCCATT ATTGATCTTC CTATGTT 2588 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 837 anlno adds 

(B) TYPE: amino acid 
(D> TOPOLOGY: linear 

(il) MOLECULE TYPE: protein 

(xl) SEQUENCE DESCRIPTION: SEQ ID HO: 29: 

Met Gly His Tyr Thr He Ser Gly He Arg phe Pro Cys Ala Pro Leu 
1 5 10 15 
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Cys Lys Ser Gin Ser Thr Gly Phe His Gly Tyr Arg Arg Thr Ser Ser 
20 25 30 

Cys Leo Ser Phe Asn Phe Lys Glu Ala Phe Ser Arg Arg Val Phe Ser 
35 40 45 

Gly Lys Ser Ser His Glu Ser Asp Ser Ser Asn Val Het Val Thr Ala 
50 55 60 

Serlys Arg Val Leu Pro Asp Gly Arg He Glu Cys Tyr Ser Ser Ser 
65 70 75 80 

Thr Asp Gin Leu Glu Ala Pro Gly Thr Val Ser Glu Glu Ser Gin Val 
85 - 90 95 

Leu Thr Asp Val Glu Ser Leu He Met Asp Asp Lys lie Val Glu Asp 
100 105 no 

Glu Val A$n Lys Glu Ser Val Pro Met Arg Glu Thr Val Ser He Arg 
115 120 125 

tys He Gly Ser Lys Pro Arg Ser He Pro Pro Pro Gly Arg Gly Gin 
130 135 140 

Arg lie Tyr Asp He Asp Pro Ser Leu Thr Gly Phe Arg Gin His Leu 
145 150 155 160 

Asp Tyr Arg Tyr Ser Gin Tyr t^s Arg Leu Arg Glu Glu lie Asp Lys 
165 170 175 

Tyr Glu Gly Ser Leu Asp Ala Phe Ser Arg Gly Tyr Glu Lys Phe Gly 
180 185 190 

Phe Ser Arg Ser Glu Thr Gly He Thr Tyr Arg Glu Trp Ala Pro Gly 
195 200 205 

Ala Thr Trp Ala Ala Leu He Gly Asp Phe Asn Asn Trp Asn Pro Asn 
210 215 220 

Ala Asp Val Met Thr Gin Asn Glu Cys Gly Val Trp Glu He Phe Leu 
225 230 235 240 

Pro Asn Asn Ala Asp Gly Ser Pro Pro He Pro His Gly Ser Arg Val 
245 250 255 

Lys lie Arg Met Asp Thr Pro Ser Gly Asn lys Asp Ser lie Pro Ala 
260 265 270 

Trp He l^s Phe Ser Val Gin Ala Pro Gly Glu Leu Pro Tyr Asn Gly 
275 280 285 

He Tyr Tyr Asp Pro Pro Glu Glu Glu Lys Tyr Val Phe Lys Asn Pro 
290 295 300 

Gin Pro Lots Arg Pro Lys Ser Leu Arg He Tyr 61 u Ser His Val Gfy 
305 310 315 320 
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Met Ser 


Ser 


Thr Glu Pro Val lie Asn Thr 
325 330 


Tyr 


Ala Asn Phe Arg 
335 


Asp 


Asp Val 


Leu 


Pro Arg He Lys Lys Leu Gly 
340 345 


Tyr 


Asn Ala Val Gin 
350 


Leu 


Net Ala 


He 
355 


Gin Glu His Ser Tyr Tyr Ala 
360 


Ser 


Phe Gig Tyr His 


Val 


Thr iAsn 
370 


Phe 


Tyr Ala Ala Ser Ser Arg Phe 

375 


61y 


Thr Pro Asp Asp 
380 


Leu 


385 




Leu 


He Asp Lys Ala His Glu Leu 


Gly 
395 


Leu Leu Val Leu 


Met 
400 


nap 




Val 


His Ser His Ala Ser Thr Asn 
405 410 


Thr 


Leu Asp Gly Leu 
415 


Asn 


Met 


Phe 


Asp 


Gly Thr Asp Gly His Tyr Phe 
420 425 


His 


Ser Gly Pro Arg 
430 


Gly 


HIS 


His 


a 


Met Trp Asp Ser Arg Leu Phe 
440 


Asn 


Tyr Gly Ser Trp 
445 


Glu 


vai 


Leu 
450 


Arg 


Phe Leu Leu Ser Asn Ala Arg 
455 


Trp 


Trp Leu Asp Glu 


Tyr 




Phe 


Asp 


Gly Phe Arg Phe Asp Gly Val 


Thr 
475 


Ser Met Met Tyr 


Thr 
480 


His 


His 


Gly 


Leu Gin Val Asp Phe Thr Gly 
485 490 


Asn 


Tyr Asn Glu T^r 


Phe 


Gly 


Tyr 


Ala 


Thr Asp Val Asp Ala Val Val 
500 505 


Tyr 


Leu Met Leu Leu 
510 


Asn 


Asp 


Het 


He 
515 


His Gly Leu Phe Pro Glu Ala 
520 


Val 


Thr He Gly Glu 
525 


Asp 


Val 


Ser 
530 


Gly 


Met Pro Thr Val Cys He Pro 
535 


Val 


Glu Asp Gly Gly 
540 


Val 


* 


Phe 


Asp 


Tyr Arg Leu His Het Ala Val 
550 


Ala 
555 


Asp Lys Trp Val 


Glu 
560 


lie 


lie 


Gin 


Lys Arg Asp Glu Asp Trp L^s 


Het 


Gly Asp, He Val 
575 


HIS 


Het 


Leu Thr 


Asn Arg Arg Trp Leu Glu Lys 
580 585 


Cys 


Val Ser Tyr Ala 
590 


Glu 


Ser 


His 


fill 


Gin Ala Leu Val Gly Asp Lys 


Thr 


He Ala Phe Trp 
605 


Leu 


Met 


Asp 
610 


Lys 


Asp Met Tyr Asp Phe Met Ala 
615 


Leu 


As j> Arg Pro Ser 


Thr 
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Pro Leu He Asp Arg Gly Val Ala Leu His Lys Met He Arg Leu lie 
625 630 635 640 

Thr Met Gly Leu Gly Gly Glu Gly Tyr Leu Asn Phe Met Gly Asn 61 u 
645 650 655 

Phe Gly His Pro Glu Trp He Asp Phe Pro Arg Gly Asp Leu His Leu 
660 665 670 

ProiSer Gly Lys Phe Val Pro Gly Asn Asn Tyr Ser Tyr Asp Lys Cys 
675 680 685 

Arg Arg Arg Phe Asp Leu Gly Asn Ser Lys His Leu Arg Tyr His Gly 
690 695 700 

Met Gin Glu Phe Asp Gin Ala lie Gin His Leu Glu Glu Ala Tyr Gly 
705 710 715 720 

Phe Met Thr Ser Glu His Gin Tyr He Ser Arg Lys Asp Glu Arg Asp 
725 730 735 

Arg He He Val Phe Glu Arg Gly Asn Leu Val Phe Val Phe Asn Phe 
740 745 750 

His Trp Thr Ser Ser Tyr Ser Asp Tyr Arg Val Gly Cys Leu Lys Pro 
755 760 765 

Gly Lys Tyr Lys lie Val Leu Asp Ser Asp Asp Pro Leu Phe Gly Gly 
770 . 775 . 780 

Phe Gly Arg Leu Ser His Asp Ala Glu His Phe Ser Phe Glu Gly Trp 
785 790 795 800 

Tyr Asp Asn Arg Pro Arg Ser Phe Met Val Tyr Thr Pro Cys Arg Thr 
605 810 815 

Ala Val Val Tyr Ala Leu Val Glu Asp Glu Val Glu Asn Glu Leu Glu 
820 825 830 

Pro Val Ala Gly * 
835 



(2) INFORMATION FOR SEQ 10 NO: 30: 

(1) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2805 base pairs 
(8) TYPE: nucleic add 
(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(1x) FEATURE: 

(A) NAME/KEY; COS 

(B) LOCATION; 131. .2677 

(x1) SEQUENCE DESCRIPTION: SEQ 10 NO: 30: 
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AGTGAATTCG AGCTCG6TAC CCGGGGATCC 6ATTCGCATT TCTC6CTATT 6CTTTCCGTT 60 

TATTTCCATA TATAAAATAT CAAATCTAAT CACTOCGCC ATTTUTATCT CTCTCCAAAC 120 

TCTCACC6AA ATG GTA TAC TAC ACT GTA TCA GGC ATA CGT TFT CCT 7GT 169 
Met Val Tyr Tyr Thr Val Ser Gly He Arg Phe Pro Cys 
840 845 850 

GCA CCT TCA CTC TAC AAA TCT CA6 CTC ACC AGC TTC CAT GGC GGT CGA 217 
Ala tPro Ser Leu Tyr Lys Ser Gin Leu Thr Ser Phe His Gly Gly Arg 
855 860 865 

AGG ACC TCT TCT GGC CTT TCC TTC CfC TTG AAG AAG GAG CTG TTT CCT 265 
Arg Thr Ser Ser Gly Leu Ser Phe. Leu Leu Lys Lys Glu Leu Phe Pro 
870 875 880 

CGG AAG ATC TTT GCT GGA AAG TCC TCT TAT GAA TCT GAC TCC TCA AAT 313 
Arg Lys He Phe Ala Gly Lys Ser Ser Tyr Glu Ser Asp Ser Ser Asn 
885 890 . 895 

TTA ACT GTC TCT GCA TCT GAG AAG GTC CTT GTT CCT GAT GAT CAG ATT 361 
Leu Thr Val Ser Ala Ser Glu Lys Val Leu Val Pro Asp Asp Gin lie 
900 905 910 

GAT GGC TCT TCT TCT TCA ACA TAT CAA TTA GAA ACC ACT GGC ACA GTT 409 
Asp Gly Ser Ser Ser Ser Thr Tyr Gin Leu Glu Thr Thr Gly Thr Val 
915 920 925 930 

TTG GAG GAA TCC CAG GTT CTT GGT GAT GCA GAG AGT CTT GTG ATG GAA 457 
Leu Glu Glu Ser Gin Val Leu Gly Asp Ala Glu Ser Leu Val Met Glu 
935 940 945 

GAT GAT AAG AAT GTT GAG GAG GAT GAA GTA AAA AAA GAG TCG GTT CCA 505 
Asp Asp Lys Asn Val Glu Glu Asp Glu Val Lys Lys Glu Ser Val Pro 
950 955 960 

TTG CAT GAG ACA ATT AGC ATT GGA AAA AGT GAA TCT AAA CCA AGG TCC 553 
Leu His Glu Thr He Ser lie Gly Lys Ser Glu Ser Lys Pro Arg Ser 
965 970 975 

ATT CCT CCA CCT GGC AGT GGG CAG AGA ATA TAT GAC ATA GAT CCA AGC 601 
He Pro Pro Pro Gly Ser Gly Gin Arg He Tyr Asp He Asp Pro Ser 
" 980 985 990 

TTG GCA GGT TTC CGT CAG CAT CTT GAC TAC CGA TAT TCA CAG TAC AAA 649 
Leu Ala Gly Phe Arg Gin His Leu Asp Tyr Arg Tyr Ser Gin Tyr Lys 
995 . . 1000 1005 1010 

AGG CTG CGT GAG GAA ATT GAC AAG TAT GAA GGT GGT TTG GAT GCA TTC 697 
Arg Leu Arg Glu Glu lie Asp Lys Tyr Glu Gly Gly Leu Asp Ala Phe 
1015 1020 1025 

TCT CGT GGA TTT GAA AAG TTT GGT TTC TTA CGC AGT GAA ACA GGA ATA 745 
Ser Arg Gly Phe Glu Lys Phe Gly Phe Leu Arg Ser Glu Thr Gly He 
1030 1035 1040 • 
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ACT TAT AGG GAA TG6 GCA CCT GGA QCT AC6 TGG GCT GCA CTT ATT GGA 793 
Thr Tyr Arg Glu Trp Ala Pro Gly Ala Thr Trp Ala Ala Leu He Gly 
1045 1050 1055 

GAT TTC AAC AAT TGG AAT CCT AAT GCA GAT GTC ATG ACT CGG AAT GAG - 841 
Asp Phe Asn Asn Trp Asn Pro Asn Ala Asp Val Met Thr Arg Asn Glu ~ 
1060 "1065 1070 

TTT G6T GTC TGG GAG ATT TTT TTG CCA AAT AAC GCA GAT GGT TCA CCA 889 
Phe <Gly Val Trp Glu He Phe Leu Pro Asn Asn Ala Asp Gly Ser Pro 
1075 1080 1085 1090 

CCA ATT CCT CAT GGT TCT CGA GTA AAG ATA CGC ATG GAT ACT CCA TCT 937 
Pro lie Pro His Gly Ser Arg Val Lys He Arg Het Asp Thr Pro Ser 
1095 1100 1105 

GGC ATC AAA GAT TCA ATT CCT GCT TGG ATC AAG TTC TCA GTT CAG GCA 985 
Gly He Lys Asp Ser He Pro Ala Trp He Lys Phe Ser Val Gin Ala 
1110 1115 1120 

CCT GGT GAA ATC CCA TAC AAT GCC ATA TAC TAT GAT CCA CCA AAG GAG 1033 
Pro Gly Glu lie Pro Tyr Asn Ala He Tyr Tyr Asp Pro Pro Lys Glu 
1125 1130 1135 

GAG AAG TAT GTG TTC AAA CAT CCT CAG CCA AAG -ASA CCA AAA TCA CTT 1081 
Glu Lys Tyr Val Phe Lys His Pro Gin Pro Lys Arg Pro Lys Ser Leu 
1140 1145 1150 

AGG Apr TAT GAA TCT CAT GTT GGG ATG AGT fiGT ATG GAG CCA ATA ATT 1129 
Arg He Tyr Glu Ser His Val Gly Het Ser Ser Net Glu Pro He He 
1155 1160 1165 1170 

AAC ACA TAT GCC AAC TTT AGA GAT GAT ATG CTT CCT CGC ATC AAA AAG 1177 
Asn Thr Tyr Ala Asn Phe Arg Asp Asp Met Leu Pro Arg He Lys Lys 
1175 1180 1185 

CTT GGC TAC AAT GCT GTT CAG ATC ATG GCT ATT CAA GAG CAT TCC TAT 1225 
Leu Gly Tyr Asn Ala Val Gin He Met Ala He Gin Glu His Ser Tyr 
1190 1195 1200 

TAT GCT AGT TTT GGG TAC CAT GTC ACA AAC TTT TTT GCA CCT AGC AGC 1273 
Tyr Ala Ser Phe Gly Tyr His Val Thr Asn Phe Phe Ala Pro Ser Ser 
1205 1210 1215 

CGA TTT GGA ACT CCT GAT GAT TTG AAG TCT TTA ATA GAT AAA GCT CAT 1321 
Arg Phe Gly Thr Pro Asp Asp Leu Lys Ser Leu He Asp Lys Ala His 
1220 1225 1230 

GAG TTA GGG CTG CTT GTT CTC ATG GAT ATT GTT CAT AGC CAT GCG TCA 1369 
Glu Leu Gly Leu Leu Val Leu Met Asp He Val His Ser His Ala Ser 
1235 1240 1245 1250 

AAT AAT ACG TTG GAT GGG CTG AAC ATG TTT GAT GGT ACG GAT AGT CAC 1417 
Asn Asn Thr Leu Asp Gly Leu Asn Met Phe Asa Gly Thr Asp Ser His 
1255 1260 1265 * 
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TAC TTC CAC TCC GGA TCA CGG G6T CAT CAT TOG TTG TGG GAC TCT CGC 1465 
Tyr Phe His Ser Gly Ser Arg Gly His His Trp Leu Trp Asp Ser Arg 
1270 1275 ISO 

F I? ^ C I AT TGG ^ 6715 CTA AGA TTT CTT CTT TCA AAT 1513 

Leu Phe Asn Tyr Gly Ser Trp Glu Val Leu Arg Phe Leu Leu Ser Asn 
1285 1290 1295 

GCA AGA TGG TGG TTG GAA GAG TAC AGG TTT GAT GGT TTT AGA TTT CAT 1561 
Ala ^ Tr P Tr P Leu G1 " 61" T y r Arg Phe Asp Gly Phe Arg Phe Asp 
1300 1305 1310 

GGG GTG ACT TCC ATG ATG TAC ACT CCC CAT GGG TTG*CAG GTA GCT TTT 1609 
Gly Val Thr Ser Met Met Tyr Thr Pro His Gly Leu Gin Val Ala Phe 
1315 1320 1325 1330 



ACT GGC AAC TAC AAT GAG TAC TTT GGA TAT GCA ACT GAT GTA GAT GCT 1657 
Thr Gly Asn Tyr Asn Glu Tyr Phe Gly Tyr Ala Thr Asp Val Asp Ala 
1335 1340 1345 

GTG ATT TAT TTG ATG CTT GTG AAT GAT ATS ATT CAE- GGT CTT TTC CCT 1705 
Val He Tyr Leu Met Leu Val Asn Asp Met He His Gly Leu Phe Pro 
1350 1355 1360 

GAG GCT GIT ACC ATT GGT GAA GAT GTT AGC GGA AAG CCA ACA TTT TGC 1753 
Glu Ala Val Thr He Gly Glu Asp Val Ser Gly Lys Pro Thr Phe Cys 
1365 1370 1375 



Pro Val Glu Asp Gly Gly Val Gly Phe Asp Tyr 
1380 1385 1391 



GAT 


GGT GGT GTT 


GGA 


Asp 


Gly Gly Val 


Gly 




1385 


AAA 


TGG ATT GAG 


ATT 




Trp He Glu 


He 


1400 





►0 



Aia lie Aia Asp Lys rrp lie Glu He Leu Lys Lys Arg Asp Glu Asp 
1395 1400 1405 1411 



10 



1801 
1849 



TGG AAA ATG GGT GAC ATT GTG CAT ACA CTC ACC AAC AGA AGG TGG TTC 1897 
Trp Lys Met Gly Asp He Val His Thr Leu Thr Asn Arg Arg Trp Leu 
1415 1420 1425 

GAA AAA TGT GTT GCT TAT GCT GAA AGT CAT GAC CAA GCT CTT GTT GGT 1945 
61u Lys Cys Val Ala Tyr Ala Glu Ser His Asp Gin Ala Leu Val Gly 
1430 1435 _ 1440 

GAG AAA ACT ATT GCA TTT TOG CTG ATG GAC AAG GAC ATG TAC GAC TTC 1993 
Asp Lys Thr He Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp. Phe 
1445 1450 1455 

ATG GCT CGT GAC AGA CCA TCT ACT CCT CTT ATA GAT CGT GGA ATA GCA 2041 
^ pro 5er Thr Pro LetJ n * Asp Arg Gly He Ala 

1460 1465 1470 

TTG CAC AAA ATG ATC AGG CTT ATT ACC ATG GGC TTA GGC GGA GAA GGA 2089 
Leu His Lys Met He Arg Leu He Thr Met Gly Leu Gly Gly Glu Gly 
1475 1480 1485 1490 
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TAT TTG AAT TTT ATG GGA AAT 6AA TTT GGA CAT CCT GAG TGG ATT GAT 2137 
Tyr Leu Asn Phe Met Gly Asn Glu Phe Gly His Pro Glu Trp lie Asp 
1495 iSOO 1505 

TTT CCA AGA GGG GAT CGA CAT CTG CCC AAT GGT AAA GTA ATT CCA GGG 2185 
Phe Pro Arg Gly Asp Arg His Leu Pro Asn Gly Lys Val lie Pro Gly 
1510 1515 1520 

AAC AAC CAC ACT TAT GAT AAA TGC CGT CGT AGA TTT GAT CTA GGT GAT 2233 
Asn ,Asn (j^Ser Tyr Asp Lys CgsA? Arg Arg Phe ^gg Leu G1 > Asp 

GCA GAC TAT CTA AGA TAT CAT GGA ATG CAA GAG TTT GAT CAG GCA ATG 2281 
Ala Asp Tyr Leu Arg Tyr His Gly Net Gin Glu Phe Asp Gin Ala Net 
1540 1545 1550 

CAA CAT CTT GAA GAA GCC TAT GGT TTC ATE ACT TCT GAG CAC CAG TAT 2329 
Gin His Leu Glu Glu Ala Tyr Gly Phe Met Thr Ser Glu His Gin Tyr 
1555 1560 1565 1570 



ATA TCA CGG AA6 GAT GAA GGA GAT CG6 ATC ATT GTC TTT GAG AGG GGA 2377 
lie Ser Arg Lys ^ Glu Gly Asp Arg Jlg^e v *l Pte Glu Ar^Gly 

AAC CTT GTT TTT GTA TTC AAC TTT CAT TGG ACT AAC AGC TAT TCA GAT 2425 
Asn Leu Val Phe Val Phe Asn Phe His Trp Thr Asn Ser Tyr Ser Asp 
1590 1595- 1600 

TAC CGA GTT GGC TGC TTC AAG TCA GGA AAG TAC AAG ATT GTT TTG GAC 2473 
Tyr Arg Val Gly Cys Phe Lys Ser Gly Lys Tyr Lys He Val Leu Asp 
1605 1610 1615 

TO GAT GAT GGC TTG TTT GGA GGC TTC AAC AGG CTT AST CAT GAT GCC 2521 
Ser Asp Asp Gly Leu Phe Gly Gly Phe Asn Arg Leu Ser His Asp Ala 
1620 1625 1630 

GAG CAC TTC ACC TTT GAC GGG TGG TAT GAT AAC CGG CCT CGG TCC TTC 2569 
Glu His Phe Thr Phe Asp Gly Trp Tyr Asp Asn Arg Pro Arg Ser Phe 
1635 1640 1645 1650 

ATG GTA TAT GCA CCA TCT AGG ACA GCA GTG GTC TAT GCT TTA GTA GAA 2617 
Met Val Tyr Ala Pro Ser Arg Thr Ala Val Val Tyr Ala Leu Val Glu 
• 1655 1660 1665 

GAT GAA GAG AAT GAA GCA GAG AAT GAA GTA GAA AGT'GAA GTG AAA CCA 2665 
Asp Glu Glu Asn Glu Ala Glu Asn Glu Val Glu Ser Glu Val Lys Pro 
1670 1675 1680 

GCC TCC GGC TGA GATAGATATT TAGTAAGAGG ATOCCCTAAA GCAGGAATGG • 2717 
Ala Ser Gly * 
1685 

TTAACCTGTG CATCTGCATT GAACGACGTA TATTGAGACT GGAAATCCAT ATGACTAGTA 2777 
GATCCTCTAG AGTCGACCTG CAGGCATG _ 2805 
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(2) INFORMATION FOR SEQ 10 NO: 31: 

( 1 ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 849 amino acids 

(B) TYPE: amino acid 

(D) TOPOLOGY: linear _ 
(11 > MOLECULE TYPE: protein 

(X1) SEQUENCE DESCRIPTION: .SEQ 10 NO: 31: ' * 

Met Val Tyr Tyr Thr Val Ser Gly He Arg Phe Pro Cys Ala Pro Ser 
15 10 15 

Leu Tyr Lys Ser Gin Leu Thr Ser Phe His Gly Gly Arg Arg Thr Ser 
20 25 30 

Ser Gly Leu Ser Phe Leu Leu Lys Lys Glu Leu Phe Pro Arg Lys He 
35 40 45 

Phe Ala Gly Lys Ser Ser Tyr Glu Ser Asp Ser Ser Asn Leu Thr Val 
50 55 60 

Ser Ala -Ser Glu Lys Val Leu Val Pro Asp Asp Gin lie Asp Gly Ser 
65 70 75 80 

Ser Ser Ser Thr Tyr Gin Leu Glu Thr Thr Gly" Thr Val Leu Glu Glu 
85 90 95 

Ser Gin Val Leu Gly Asp Ala Glu Ser Leu Val Met Glu Asp Asp Lys . 
100 105 110 

Asn Val Glu Glu Asp Glu Val Lys Lys Glu Ser Val Pro Leu His Glu 
115 120 125 

Thr He Ser He Gly Lys Ser Glu Ser Lys Pro Arg Ser He Pro Pro 
130 135 140 

Pro Gly Ser Gly Gin Arg He Tyr Asp He Asp Pro Ser Leu Ala Gly 
145 150 155 160 

Phe Arg Gin His Leu Asp Tyr Arg Tyr Ser Gin Tyr Lys Arg Leu Arg 

Glu Glu He Asp l^ys Tyr Glu Gly Gly Leu Asp Ala Phe Ser Arg Gly 
180 185 190 

He 
205 

211) " 215 ' 220 

Trp Asn Pro Asn Ala Asp Val Met Thr Arg Asn 
225 230 235 " 240 



Phe Glu L^s Phe Gly Phe Leu Arg Ser Glu Thr Gly lie Thr Tyr Arg 
Glu Trg Ala Pro Gly Ala Thr Trp Ala Ala Leu He Gly Asp Phe Asn 
Asn Trp Asn Pro Asn Ala Asp Val Met Thr Arj Asn Glu Phe Gly Val 
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Trp Glu lie Phe Leu Pro Asn Asn Ala Asp Gly Ser Pro Pro lie Pro 
245 250 255 

His Gly Ser Arg Val Lys He Arg Met Asp Thr Pro Ser Gly He Lys 
260 265 270 

Asp Ser He Pro Ala Trp lie Lys Phe Ser Val Gin Ala Pro Gly Glu 

275 280 285 

lie 4 Pro Tyr Asn Ala lie Tyr Tyr Asp Pro Pro j^s Glu Glu Lys Tyr 

Val Phe Lys His Pro 61n Pro Lys Arg Pro Lys Ser Leu Arg He Tyr 
305 310 315 320 

Glu .Ser His Val Gly Met Ser Ser Met Glu Pro lie He Asn Thr Tyr 
325 330 335 

Ala Asn Phe Arg Asp Asp Met Leu Pro Arg He Lys Lys Leu Gly Tyr 
340 345 350 



Asn 


Ala 


Val 
355 


Gin He Met Ala He Gin Glu His 
360 


Ser Tyr Tyr Ala 
365 


Ser 


Phe 


Gly 
370 


Tyr 


His Val Thr Asn Phe Phe Ala Pro 
• 375 


Ser Ser Arg Phe 
380 


Gly 


Thr 
385 


Pro 


Asp 


Asp Leu L^s Ser Leu lie Asp Lys 


Ala His Glu Leu 


Gly 
400 


Leu 


Leu 


Val 


Leu Met Asp He Val His Ser His 
405 410 


Ala Ser Asn Asn 
415 


Thr 


Leu 


Asp 


Gly 


Leu Asn Met Phe Asp Gly Thr Asp 
420 425 


Ser His Tyr Phe 
430 


His 


Ser 


Gly 


Ser 
435 


Arg Gly His His Trg Leu Trp Asp 


Ser Arg Leu Phe 
445 


Asn 


Tyr 


Gly 
450 


Ser 


Trp Glu Val Leu Arg Phe Leu Leu 
455 


Ser Asn Ala Arg 
460 


Trp 




Leu 


Glu 


Glu Tyr Arg Phe Asp Gly Phe Arg 


Phe Asp Gly Val 


Thr 
480 


Ser 


Met 


Met 


Tyr Thr Pro His Gly Leu Gin Val 
485 490 


Ala Phe Thr G^ 


Asn 


Tyr 


Asn 


Glu 


Jgr Phe G]y Tyr Ala Thr Asp Val 


Asp Ala Val He 

510. 


Tyr 


Leu 


Met 


Leu 
515 


Val Asn Asp Met He His Gly Leu 
520 


Phe Pro Glu Ala 
525 


Val 


Thr 


He 
530 


Gly 


Glu Asp Val Ser Gly Lys Pro Thr 
535 


Bhe Cys He Pro 
540 


Val 
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Glu Asp Gly Gly Val Gly Phe Asp Tyr Arg Leu His Met Ala He Ala 

545 550 555 560 

Asp Lys Trp Fie Glu He Leu Lys Lys Aro Asp Glu Asp Trp Lys Met 
. * 565 570 575 

Gly Asp He Val His The Leu Thr Asn Arg Arg Trp Leu Glu Lys Cys 
580 585 590 

Val Ala Tyr Ala Glu Ser His Asp Gin Ala Leu Val Gly Asp Lys Thr 
* 595 600 60S 

He Ala Phe Trp Leu Met Asp Lys Asp Met Tyr Asp Phe Met Ala Arg 
610 ^ 615 620 

Asp Arg Pro Ser Thr Pro Leu He Asp Arg Gly lie Ala Leu His Lys 
625 ^ 630 635 640 

Met He Arg Leu He Thr Met Gly Leu Gly Gly Glu Gly Tyr Leu Asn 
645 650 655 

Phe Met Gly Asn Glu Phe Gly His Pro Glu Trp He Asp Phe Pro Arg 
660 665 670 

Gly Asp Arg His Leu Pro Asn Gig Lys Val He Pro G^ Asn Asn His 

Ser gr Asp Lys Cys Arg Arg Arg Phe Asp Leu Gig Asp Ala Asp Tyr 

Leu Arg Tyr His Gly Met Gin Glu Phe Asp Gin Ala Met Gin His Leu 
705 710 715 720 

Glu Glu Ala Tyr Gly Phe Met Thr Ser 61 u His Gin Tyr lie Ser Arg 
725 730 * 735 

Lys Asp Glu Gly Asp Arg He He Val Phe Glu Arg Gly Asn Leu Val 
740 745 750 

Phe Val Phe Asn Phe His Trp Thr Asn Ser Tyr Ser Asp Tyr Arg Val 
755 760 765 

Gly Cys Phe Lys Ser Gly Lys Tyr Lys He Val Leu Asp Ser Asp Asp 
770 775 780 _ 

Gly Leu Phe Gly Gly Phe Asn Arg Leu Ser His Asp Ala Glu His Phe 
785 790 795 800 

Thr Phe Asp Gly Trp Tyr Asp Asn Arg Pro Arg Ser Phe Met Val Tyr 
805 810 815 

Ala Pro Ser Arg Thr Ala Val Val Tyr Ala Leu Val Glu Asp Glu Glu 
820 825 830 

Asn Glu Ala Glu Asn Glu Val Glu Ser Glu Val Lys Pro Ala Ser Gly 

835 840 845 * 
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CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS: 

1 . A nucleic arid sequence encoring a polypeptide, the encoded 

5 polypeptide comprising at least a portion of the amino acid sequence shown In 
Figure 4 or Figure 13 which retains sirtWerrtstaf^ 

activity when expressed in E. cofi KV 832 to complement the branching enzyme 
mutation therein; at least 200 bp; arid exhibftfog at least 88% sequ 
with the corresponding region of the DNA sequence shown in Figures 4, 9, 1 0 or 
10 1 3. operaWyBnked In the sense or anti-sense o 
plants. 

2. A nucleic add sequence according to claim 1 1 comprising nucleotides 21 - 
2S31 of the nucleic add sequence shown In Figure 4, or a functionary equivalent 

15 nudeotide sequence whk* 

the nucleic add sequence shown in Figure 4. 

3. A nucleic add sequence accorclng to daim 1, composing rxjdeotides 131 - 
2677 of tr»nuc^ acid sequence shown 

20 sequence which riybrfd^urxte 
acid sequence shown In Figure 13. 

4. A nucleic add sequence according to any orwcrfdaJms 1,2or 3«XTiprteing 
a 5'anc^ a 3* untranslated region. 

25 

5. A nudeteadd sequence acoordng to any one ddalmsl overtaxing a 
polypeptide having the amino add sequence NSKH at about residue 687. 

6. A nucleic acid sequence comprising at least 200 bp arxi exhibiting at least 
30 88% sequence identity wtth the corresponding region of the DNA sequence 

shown In Figures 4, 9 ,1 0 or 13, opecabry Inked In the sense or anti-sense 
oilentalkx) to a promoter operable In plants. 

7. AnudeicaddseqijerCTaccorqingtod 
35 600bp. 
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8. Aseqiienc8acxxKdtngtocJaim6or7 f comprise a 5* ancVdr^untrarelated 
region. 

5 

9. A sequence according to claim 8, comprising nucleotides 688-1 044 of the 
sequence shown in figure 9, ancVor nucleotides 1507-1900 of the sequence 
shown In Figure 10. 

10 10. A sequence according to claim 6, comprising the nucleotide sequence 
shown in Figure 10. 

11. Arepicabtenuctefcackic 
according to any one of the preceding claims. 

15 

12. A polypeptide comprising at least a portion of the amino acid sequence 
shown in Figure 4 or Figure 13 which retains sulfciert star* 

(SBE) activity when expressed In E col KV 832 to complement the branching 
enzyme muiaScm therein; e^ 
20 exhfetertieai*88%sequera 
sequerx»showninFlguTes4 t 9JO 
sense orientation to a promoter opetable In plants. 

13. A polypeptide accorcfing to claim 12. m substantial isolaflon from o«to 
25 polypeptides. 

14. A polypeptide according to daim 12o 
NSKH at about position 697. 

30 15. AmelrxxJof modtyngstarchto^ 
tobemocfifledundersuitabtecor^ 
according to any one of claims 12, 13 or 14. 

16. A method of ataing a plarrt tasted 
35 the eel a nudeic add sequence cori^Wngtf least 200 bp ar^ 
88%sequerwekternitywfththecorT^^ 

shown in figures 4, 9, 10 or 13, operaWy Inked to the sense or anti-sense 
orientation to a sutebJe promoter si^ in the host eel, end causing trar^ 




•••• 
•••• 
• • • 
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the introduced nudeotte 
5 thereof being sufficient to interfere wRh the expression of a r^^ 
naturally present In the rwst ceO. whtt 
having SBE activity. 

17. A method according to claim 16, wherein the host cell Is from a cassava, 

10 banana, potato, pea, tomato, maize, wheat, barley, oat, sweet potato or rice plant 

18. A method according to daim 16 or 17, comprising the irrtroAj^ 
rrcxe further mjdetea^ 

orientation to a suitable prorroter active In the hort 
15 the one or more further nucleic ac^ 

products thereof being sufficient to interfere with the expression of homologous 
gene(s) present in the host cell. 

19. Amethodaccordngtoc^ 

20 sequenoes Interfere with the express 

20. A method aocordlng to daim 18 or 19, wherein tr» further 
sequence comprises at least part of an SBE I gene. 

25 21. A method according to daim 

comprises at least part of the cassava SBE I gene. 

22. A method according to any or»dd 

selectBximmoneofthefbllowfng: cassava, banana, potato, pea, tomato, maize, 
30 wheat, barley, oat, sweet potato or rice. 

23. A method accord^ to any one of dak^ 
eel gh» rise to starch having dWere^ 
unaltered cel. 



35 



24. A method according to any one of dairo 
of growing the altered host eel irtoaptatf orplartlet 
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25. A method of obtaining starch having altered properties, comprising growing 
5 a plant frctn an altered hc^ceflaccc^ 
trie starch therefrom. 

28. Aplartc<p^ceilfnt0whttiha8beOT 

sequence comprising at least 200bp and exrribtting at least 6d% sequence 
10 Identity wfth the corresponding region of the DMA sequence shown in Figures 4, 9, 
1 0, or 1 3, operatty finked in the sense or amksenseor1entaik)nto aprofTK)^ 
operable in plants, or the progeny thereof. 

27. Aplarrtaccordr^todaim24 t afterBdbyth^ 
15 16-22. 

28. Starch octeinabiefrcm 

aftered properties compared to stare*) extracted from an 
plant 

20 

29. Starch obtained from an altered plant aocordtag to daim 26 cr 27, 
altered properties compared to starch extracted from an equivalent 
plant 

25 30. Starch aa&rtfrig to 

from the group consisting ot cassava, banana, potato, pea, tornato, maize, wheat, 
barley, oat, sweet potato and rice plants. 

31. Starch according to any one of claims 28, 29 a 30, having increased 
30 amytose content compared to starch extracted from an equr^ 
plant 

Dated this 21st day of December 2000 

35 NATIONAL STARCH AND CHEMICAL INVESTMENT 
HOLDING CORPORATION 

a their Patent Attorneys 
ILUSON & CO 

0 
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., Hg.2. 



T*mATTatfArCMTA*T*CWrC*tUT*CCCiTTTtTTT1TTT T TC I f l| UM IIII f*ilMMCTftMOfK<Uf fMTIgBIOBTTCTIACICfCTCTCTMCTTCTC 



r 

teaeirritee htutmmmi icrrttTttcr mAceiAf *f mceiftAflagBTTtAfljiTTM* WMi^iwirtitptTt-^^ "tT VICT tip rruxc 3 

ItNCTLHltltNrflLCSfrOtlirtLfV 

B0MTT|I0lirPt»fLtfi0ITBrH0BilT»»CLt^B F 

»t ** CireC DC8TTTTCr*tt*6S0rCTTCTttt>***BTMTC tC*T^UllCfBiCTCCTC*^f^?^1XCTC*<T8CTTC^MIa^<tTtCTTCCT^fCI TCC^t1C^T0C^*T 

rrcrncKcicjuuMAUcrcecAftMicKCtrTCAAT*^ 1 

tCTttnUACA6ATtA*negAAacCCTttflgUMmc^»CMTCCC^ICCn 

MutMmeTciuTTmrtcttCcacerercuMinTcnmTCuetuTGKTK*^ 4 

• ••TO0lC*Pei*ICClifLfDf(ILIMODIt?COCVNIK 
TcmtCt*MqcttA«AC*STIACCMCttaA**MJT^KtAlKC4^ 

r .. r • 

Ci*CJtCTmTr*CtWrtTCitt W«A***C>CTtCC*OWCWrt | I UJ | T L TOC1CHTC** 



r 



rancnATtiui ATrTCTuccecTMitct£urGuccccucT4Ac i jmctc rum TirnmnMOAni<trcTACMitfT««mrT«cTc*c«cucMJicccTc 

<TCTTTTTttCaaT4A!gttlTC0TTtA(C*CC UTTCCCCAUCT tC T CCMTMAaTiCtt*TCUT*CrmTtlC8tA*aAmmUTTtCT>CntUreAiCTTtTU 



rTCCUWUTS«t*TAUClA1MTCCTC««*« M fla«> ^ rCJCTTCWlMMQJttTCB 

CA M ! lU I Ul TtCACIFW8ttTATAtTttaTATATttTA<TACtffiC<CTC^ I l ti nBtTTTT*BT«*tCttTAMT*CtMOC 

MCgTTl£5T>*«T A1TACOACU AgHATTAU* *U TCCC MC T TT ACAS< TCATCTttTTCtTCttATCiUAAAt I I MC I A CJUTttTITKACCTtA TtCCrATtCAACAC 

•fteucmKturcArtcc ttnruTiuT? trcT*v*ceei TCAMttmA«*if«iAtiiteit^mTM*Arti*«n^ 1300 

Um*T>mT<CT*BTTrrCQCUttUOIC*t***CT > TTATCDttCTACmCClATTfBaUCTCC *CATMmAAAgTtCTTA«TglTlAA«CTCA£tt*atTA<CTCTTtTT 
BTM«TATUTAtUTtAAA*CttAlACT«^TrmWAA*tKQTtCAT« 

NiTT*tratif«t*rTiAtftftrftT»00kiSLfOi*iici.«Lt 

r , 

«UrCATWTATT8ntAtACCCAlCtATeA*TOTOTTCCATe»Tl^ 

UMMTmATuciutATCw«arMTTe*n*iK«jccrAccca*£m^ tM9 
atfTcTcccnme*JCt*ttgo«cTecoiawnTM6tttr?c?rcTTTcAiATAt*^ 

CTfttfT«frtOAAMnQA1»CgTtlACCtTCtAACATTCCA>*«>MM HT TTATtTT CC ■ TC A ft AA CT * ACTtAWf TtAAACt*CCtAA<Tt WMCTMCC6AC TWBTT AC tM ° 
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Fig.2 (Cont). 



ATCTACAXCTAmTWTTBCAMTAtATTTCACCCCW 

TACATQTGSBT AC T AC C T A AC C TCCAT C T AAACTCSCU T 1 CA TCI I AC T 1 1 tCAAACC T ATACQTTCAtT ACA TC T ACBACACt AAAT ABAC TAGBACAACT T ACT AT ACTUS TACCA ' 



CTCTT CCC ABA6B6 TQTC ACC IT TCP T CAABA TCI TACTCBAATCC CAAC ACT TTCt A TTCCCBT T BAABATBBTCCTCTTBCCTTTCATT ATCC TtTttACATCTTTCTTGtTflAT AiM> 
BAMACCCTCTC C6AC ABTC8 T AAC C ACT ft T AC AAT C ACC T T At EST I ft T t A AAC6 T AA6CCC AACTTCT ACC ACt At I iffr A A if I AA T1CC AaAcetftT Af faAP ■ w 1 

LrPCAffriCr0*»Cn^TVCI»VIDe9fOf0rilLHIIiriDK 



TCCCTTCAAAT TAT TC ACA AQACACA I CA AQ ATTC8 AAA AT CCCT 6*C ATTC T AC AT ATCCT6ACC A ATI BCtfiflT 88 TTtWAAABTBTCTTTCTTA TCt TCAA ACTCATCACCACCCC 
ACCCAAC TCT AA T AA6TC T Tt TCTt T ACT TCT AACCTTTT ACCCAC T8 T AAC A TBI A T ACBACT MTTSTCCQC C ACCAACCT TTTC ACM AAAfiAA I ACCACTTTCACT ACTCBfCCBB 

CTTBTTMTB*CAA*ACTATTCTATTTTCCCTC«TqCAe^ 

BAACAACtACTCTTTT6ATAACBTAAAACt0ACTAC£TCfTCCTATACAT ** 
I ? B 0 I T IArwiHOKOHTDrn»IDIP|TPt|0«CIAll«Ktll 

r 

AccemtTuCATCcamcocatacAAcuTAmiJutrrTAtcwAATCAAmc^ 

>■■■■« 1 I ■ I ■■■ »> " ' I. A - t.lii l i >■ . i...^ 

TCCCAAT AA TBSTAC CCBAATCCOtCTtTTCCTAT AAAC tT A A AtTACC C I TT ACT I AAACC TCT A08AC TC ACC1 AAC TAAMttTTCTtCCCTAeCTOTASACSftSTTACUTmAT 

fco«V 

ATTfritBlAlf AAfCAfAC TTATBATAAATCCCBTtBTACATTTCATCT 

■ ' i » - i I ■ i ■ 1 1 i i i ■ 1 1 i - .... t 22t( 

TAMBTCCCTTOTTttTeTCAATAeTAT TT ACCW ACC ATC1 AAAC I ACATCCACT iCCTCTaATACATTCTATAeiACCTTACCTTCTCAAACTACTCCCTTACCTTCTACAACTTCTI 

IPQHNHSV0tCt » trOLCOAOTtlVMCH0CrDOtHOHL.CC 
•CaATBSTTTtATWKmTUttlCCKlATAIATtUCe^^ 

■ ^..i^. i ■ — « i. ...i i . ..i ■ > i ,i na 

CCSMACWUOTttTOAjKACTCBTltfCAmATUtCCa 

ArsrfiTttHOT(t*KOCCORtiTrti«iiLvrTriirHWTHT 

TATTtAftATT ACCt AOTTfitttTCCT TC AABTC ACCA AACT AC AACAT TgT T T TftCAt TCCCATBATOftt ITCTTTMCACBt TTCAACACBCTTABTC ATBATBCCBASCAC TTC ACC ITT 
ATAABTCTAA TQCt TC A A CC C A CC A ACT Tt ACTCC TTTC A TBTTCT AAtA AAAC tT CACC C TACTAXCtAACAAACt TCCCAAQT TBTttBAATtABTAC TAtCCCTCTTlAABTflBA** M 

CACeSTCttTAIOAT AACCCCCC TCSSTC C TU ATCtTAT ATQtACC ATCT A£6ACA0CK WCC ATC^ 

l i t i r - i i ii| r i tiii I i I 8t4B 

CTaCtCAKATACTAITCCCCWAetCAttAASrACCAIATKfiTQH^ 



•«mHt >tMi 

i i 

BTBAAACCABCt TCCBflt 1 1ABA TACATATT T ACT AAOCBA T CCC C T AAACCAtftAA tCBTT AACC WT8CATCTCC ATT BAAC8AC4TATA TTBAOA^TTBAATTBATTTBjC TBCTCA 
CA t t^ W ! W UA<A)CCCAt TCT ATCT ATAAA TC AT TCTCt T ACCtSA f f TC BK C TT ACC AAT TBBACACTTAOAjCttT AACTTBt TBIAT AT AACTC'TBAACTTAACT I A A CT A Ctt Afl T 
» f P A % B 

t«l Ml AMil 

CB i C AC I Q > l T ATT AATTC t AACCt TCAAOQC AQACAT AC ACCCC A T AA TCCATCA Tt AT A TBAAACCTCtCt AAC TTCT AAA TC ATTTAAXAABCTBgCTOtACTCTBTAAATT AT ATI 
CCTBTflTCTT AT AAT TA ACST TCCCACTTCCGTTT CT AT&TCCCC TATTAtCTACTACTAfAC TTTCSABBBBT TflAAjCATTT ACT AAA TtCTTCCACSCACC T6ACAC ATTT AA TAT AC 

r . r . 

T ABT ACTTTB8CAAB T C ACBT TAT T A T66AT At t ATQCAfO TCCBC f A6CAAAAAT T T TC T BlATACBtt T ACT AWATTTTT AAA TC TtCtATCTTCtAC AT AAACT8CTBCTT6AATB 
ATCATOAAACCCTTCACTtCAATAATACCTArcSTACCrAC ABSCfiATtC TTTTT A AAAC AC AT ATGCB8ATCA TtC T AAA AA T TT AC ACCC TACAAOffTBTATTTC ACCACCAACTTAC 

nCtAtMCTATTTTTCA0TA AAAr6ATTCAACTTATTmCACTrTOCCTtI<AA*MI*>IIAIIII>A>l 

i ■ ■■ i ■■ ■ i — «— - -i ■ |Q7* 

AACO&CTUTAAAWTUnnACTAACTTCAATAA&AJCI6AA(CtCCACUTTTTTTmTnnTTTTT 
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Fig.4. 



CTTTt rUtT TrTCACtfl AAA TCStAC ACT AC ACQ U I CACCAA T AC CT T T fCt T TCT CC TCCXTTTCCAAATCTCAA ITT ACCeSCTTttA TCSCTA TCtttAlSACC TCCTCTTXt 



nittTTCl»rirC*«6*^C1imUCMOB01CTTerCTCC4*iCTC*TCTC*TWTeT6tfnCTCW 

4«cc**«itc**ci ice icccc aaaaca uticct acjacwcc tttcactacactact t acac t>«^ 1 



TTouTKmrcircmi^ 

AACTTACCATAACAACAACTTCTCTACT lAACCTTCtCCSUCCTCTCAAMTC T TC TTAfiflCtCCACCAATGAC T ACAAC TtftAftASTAAl ACCTACTATTC T AAC UC f"f CT AC TTC 
ICCT»»$rOOlt*^CTr»CI»OfLt 0 » C I I I l» 0 0 < IVCOC 

"■""in^TrfmriTiTm , Ti J " , ''" j " i """ ft> " T ""n , "r , rf"rn , f i Tfrfir^TT"^ 

■ -| ■ . — ■ I » ■ — ■■ I I ■ !>., — — I III It I « 

ATTTAXTTC TT AQAC AAGtt I T ACAXCC TC TCTCAA? CCT AC TC IT TT I A At C ■ ACAtlTfSTICCAttTAAtCAUTCCGCCQTCTCCCBTTICTTAf AT ACT 1 1 ATCT AMT TC8AAC T 



CACSCTTTC6TC AACACC TASATfAtCCCI A TTCACAC T AC AAAAtAt T Ct 6*0 AAftAAM TttACAAO TATCAAflt TAfl TC TMAT8CATT Tit TC6TMC TAT8AAAA6T T tCC TTTCT 
tTCCTAAAECACT TCTQBATt TAATCC6C AT AAftTBTCAl 8 1 T UC 1 0*4tC TC T TC T T T AAC TC T K AT AC T TCC ATt ACACITAC8T A* M IMC ACCT A T ACTT TTC AA ACf A A At A 

Teriowi.DT«f»oTtiL»iciOKTC(siOArsiorcircr 



CTK«TDCnttTCCTT4rTCUlATC ■ C T C ACCCCTGCT C C TWA 1 CC ACCC4AC6TAAC TAACCTC 1 AAACTT ATTSAtt TTACCA f f AC t*t (AC ACTACTOACTCTTAITCACAC 
I ■ I C T « I TTBfVAfO A T V A A I l|lf«iWllf«AOT«TOBCC 

CnrtTlCnWTCTTTtTCCCPUTAA rtCACATaTTC ACCACCAAllCCtCATCttTTtttDaiAAACAUt «AICOATACTtUTITtttJUCi>AAtentTATTCCTCCTW ^ 
CACAftACCCTCTACAAAAACGCCTTATTACCTC TACT AAC TWTCBT T A ASSIST ACCAAtAtt TCATT K TATCtVTACCTATSAQOIAtACCCTTQTTTCTAACATAACCACSAACCT 

TCAASTTC TCABT TC AAflCACC ACCT6AAC TCC C AT A T AA TCCt AT A t AC T ATClTCTTCttCAC^AA^MArtT ATCTTQT TCAAAAATCt It ACfC A I AC AC Iff AAA ATCAtTTCOftA ^ 
ACTTCAAAASTCAAOTTCCTCtTCC ACT TtAMSTAT ATI ACCfiT A I AT Wf AC T AMACMEf tCTCCTCT TCA 14CACAACTTTT TACSAOTCCffTf TTTCTBOT TTT AflTQAASCCT 

IVrtTQArCILPIHC I 1?OPPtCIITVr««riPltPKILR 

TTT ATMBTCTC AC 61 TJB1AA TCACTAO T AC 6SA6CC ACT A A 1 1 AAC AC A f A rflt CAAC ITT ACAtA MAT CTQC TTCX TCOCATCAAAAASCT IPC TAC A^TttTBTTCACCTCATflB ( 
AMTACTatt6TCUACtTTACTaTCATKCTCMTCATIAATTCTCIATACC8^T6AAATCTCTtfTAfACT 



t TATICAAjBACC ATTtAT A TT ATCC T A6 TT " TBCCT A Tt AC t TC AC AAAC T T7 T A TCXACCT Aflt AfltCCA TTTBCAAC tCCTlATflAT TT AAA1K TC MJTtflA, TAAAtCtCACtAjQT ( 
SAY AMT TCTCtT AMI A T AA I ACBATC AAAACCC A I AC TCC AC I C TT *6AAAA T ACCTC CAT C 6TC0BC T AAACCTTIASBACT AC TAAA TT TCASACAT TATt f ATTTCSASTtCTCA 



TKSTCTTt TT1TI CT C ATCtA T ATT4 TTC* T AfcCC ATCC A TtAAC T AA T At 61 TCCAT6CCC 1CAAT ATCTTTCATOtTACfXATCCTCACT ACTT TCJC TCIffB ACT ACSOtfTTCATC ^ 
ATCCACAjA$AAC AAUCTACC TAT AAC A AST AT C 66T AC 61 ACT T 6A TT AT CC AACC I ACCCCAC T t A T AX AAACT A^CATCCCt ACC ACTtATSAAMTOASACCfMTQCCCCACTAO 
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Fig.4 (Cont). 



*TTmiCTCttiCTC1CDttTrTtUJICT<T00DtteT>Cl*WT 
TAAtCTAgAttCTCICIOreCIAIAfTraTACttTCMCCCTCCAA^^ 



TB^rrCMTC*TCT*Q>CCUTUreC*rTCC*ttTU*T TT1 AtCCCCAAC f iCAJlTBMTigmMAUTCa*CTUT<TAOTK TgTCCm*TTTt4Tur»rTM*T»AtA 
ACTtAMTTACTACATOTCCftACtACCTAACOTCCATt T LAAirttC C«T T0A1CT T *C T T AfMAACTT AT ACtTTOAC T ACATCTNCSAC ACCAAAT AAACT ACSAC AAC TT ACT A T 



UATTCAIlUt I t I f CCCA flAattT»TC«CAn«TfiA*CAtrTT«IOAAtt£t AAC *£Tnct<nCCt«ngAA»ATimSTfrTCCCTTTCATTATCTTCTCCAtATMCTt 

k t aa«t ACDbtAfiAAgnrc rccaACAortar aacuc i rcT»cAATCAtcTT*aaTTtrtcAA*cgT AAam*Ac rrc tact act hjuctcaaactaat a« megrtT acctac 



mCTOATAAA TOBflTTCAM? TAT ft ACAAftACAtATfAACA TTC8AAAA TCBBTftAC A T T BT At A T AT1C T tACCi AOCTCBtWWTCTAAAAC ft HI T f CTTATCtTCAAMTt ^ 
AJCQAnAm*t«AlCTCfAAtAA«ICnCTeTtrttTTCIiA^ 



ATOCCAtCCCCTTtTTCCTCACAAAACUy T6CAT TfTBtCtflAT WAC I ACCATAT8TATBACTTCATttCTCTTt^A^ACtAtCTAtTtt ( 
tmnTCAfJUOTAAAACteAtTAttTCTTtCTATACArAcTCAUTA^ 



r r 

ACAAAATmATCAflKCTTATTA«XATtt«ATTACAXCC*a^U^CATATTTttAATTT^ ^ 
TtTTffACTAtTCCtJJlMfCCTACttTAATtCSCCTCnCCTATl 



r* r 

IT91TA^mtmCTtA^AAtAAnArAATT>mTAAAT6tCCCtrrAi^nttATtTAfiSC^ ( 
CJCtUbTTTAAACAAttAKCTTtfT AAT | f CAA T AC f ATTT AttSCCCCAf CCAAAC T AtATCMTIAtff TTCOTACAClCT*T*B1MtTT ACWTITCA^A^AWTtOTTAICTtO 



TMUCmnCWTAtCA^TMTtAAMKttttnAW 



C»AtTA«CA«TAncwn>CT«A«Tr»«mtTT M < CCCMXUM TAtAJ^TA8TCTTltAn WmTtCTTmTIWMM 1 1 Itlf AiBCTTABTCATlATir<BAfT ^ 
CCTIATCflCUTAAKtTAATMtlCAACCQACtAAirTtttlTtCTTTCATOTTTTA 



ACTTtAAXnT*MOB»TttT ACtAT AAC CBCtCTtCAT CCTTt tTCgTBTJtf ACA4XATCT > A A < C A ftf AtT ACTCTA TAC I T l a Cll tttBATWtTttAAMTCAATTmAACCIt "^ 
TlAA<TttAAAtntCtAtCATMtATTC6ttCM6tTAC>A^TAtC^*TtTBTWTACATt TTCTClTTACCAflAT ACT * ■ ■nwnCCtKtTCAXCfCTMCTTAAll TTMAC 



TCtOEWWAAMtATAICTTAAIAACAWTTCTCIirf <ffl>T6tCArTATTBAICTTttTATHT 
Am^tUTTttATATASAATTfTTQTCCAMACTTCtKCnACBStAATAACTACAA^rA^ 

9 A t 
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Fig.5. 

#60 #70 #60 #90 #100 #110 #120 

125*9*1. seq 7AGT7TTGCGTACCATGTCACAAACTTTTTTGCACCTAGCAGCCGA77TGGAAC7CC7GA7GAT7TCAAG 

TAG77TTGGG7A CA GTCACAAACTTTT T6CA CTAGCAGCCCAT7TGGAAC7CC7GA7GA777 AAG 
1 18. B«q TAGTTTTGG6TA7CACGTCACAAACTTTTATGCAGCTAGCAGCCGATTTGGAACTCCTGATGATTTAAA6 

*1140 4 1150 M160 M170 M180 *1190 *1200 

#130 #140 #150 #180 /1 70 #180 #190 

125*90. s«q TCTTTAATAGATAAAGCTCATGAGTTAGGGCTGCTTGTTCTCATGGATATTGTTCATAGCCATGCGTCAA 

TCT TAATAGATAAACCTCA CAGTTAGG CT CTTCT7CTCATGGATATTGTTCATAGCCATGC TCAA 
1 16. uq TCTCTAATACATAAAGCTCACGAGTTAGGTCTTCTTGTTCTCATGGATATTGTTCATAGCCATGCATCAA 

*1210 *1220 M230 *1240 *I260 *-1260 M270 

#200 #210 ,220 #230 #240 #250 #260 

125*94. seq ATAATACGTTGCATGGGCTGAACATGTTTGATGGTACCGATAGTCACTACTTCCACICCGGATCACGGGG 
T AAT AC G T TGGAT GGGC T GAA ATGTTTGATGGTACGGAT GTCACTACTT CACTC GGA CACGGGG 
1 16. B«q CTAATACGTTGGATCCGCTGAATATGTTTGATGGTAC6GATGGTCACTACTTTCACTCTGGACCACGGGG 

M280 *1290 M300 4 1310 *1320 *1330 r 1340 
, #270 #280 #290 #300 #310 #320 #330 
125*94. soq TCATCATTGGTTGTGGGACTCTCCCCTTTTCAACTATGGAAGCTGGGAGGTGCTAAGATTTCTTCTTTCA 

TCATCATTGG TGTGGGAC7C CGCCTTTTCAACTATGC AGCTGGCAGCT CTAA6 TTTCTTCTTTCA 
1 16. seq TCATCATTGGATGTGGGACTCCCCCCTTTTCAACTATGGGAGCTCGGAGGTTCTAAGGTTTCTTCTTTCA 

M350 M360 *1370 *13B0 M390 M400 M410 

#340 #350 #360 #370 #380 #390 #400 

125*94. aeq AATGCAAGATGGTGGTTGGAAGAGTACAGGTTTCATGGTTTTAGATTTGATGGGGTGACTTCCATGATGT 

AATGCAAG TGGTGGTTGGA GAGTACA GTTTGATCG TT ACATTTGA GGGGTGACTTC ATGATGT 
1 ia icq AATGCAAGGTGGTGGTTGGATGAGTACAAGTTTGATGGGTTCAGATTTGACGGGGTGACTTCAATGATCT 

*1420 *1430 *1440 c 1450 *1460 *1470 4 1480 

#410 #420 #430 #440 #450 #460 #470 

125*94. seq ACACTCCCCATGGGTTGCAGGTAGCTTTTACTGGCAACTACAATGAGTACTTTGGATATGCAACTGATGT 

ACAC C CATGG TTGCA6GTAG TTTTAC GGC AAC TAC AATGA T AC TTTCGAT ATGC AAC TCAT S T 
1 1& seq ACACCCATCATGGATTGCAGGTAGATTTTACeGGCAACTACAATGAATACTTTGGATATGCAACTGATGT 

*1490 *1500 *1510 4 1520 M530 M540 *1560 

#460 #490 #500 #510 #520 #530 #540 

125*94. seq AGATGCTGTGATTTATTTGA7GC776TGAATGA7ATCATTCAC6GTC7T7TCCCTGAC6CTGTTACCATT 

AGATGCTGTG TTTATTTCATGCT TGAATCA7ATGATTCA GG7C7 TTCCC GA6GC7G7 ACCATT 
1 1t aeq AGA7GCTG7GG77TA7T7GA7GC7G7TGAA7GA7A7GAT7CAT667C7C7TCCCAGAGGC7C7CACCAT7 

M560 *1570 *1560 4 1590 M600 M610 M620 

#550 #560 #570 #560 #590 #600 #6 1 0 

125*94. seq GG7CAAGA7GnACCGGAAAGCCAACA77T7CCA7TCCAGTGGAAOA7GGTGGTG7TGGATTTGATTACC 

GG7GAAGATGT7AG GGAA GCCAACA 7T7GCATTC£ GT GAAGA7GGTGGTG7TGG TTTGA7TA C 
1 16. seq GG7CAAGA7G7TAG7GGAA7GCCAACAG7T7GCA7TCCGCT7GAACATCCTGG7G7TGCCTT7GA77A7C 

M630 *1840 *1650 4 1660 M670 M680 M690 

#620 #630 #640 #650 #680 #670 #680 

125*94. acq G7C7CCACA7GGCCA77GCCGA7AAATGGA7TGAGATTC7TAAGAAGAGACATGAGGAC7GGAAAAT66G 

G7C7CCACA7GGC 77GC GA7AAATGG TT GAG AT 7 TT AGAAGAGAGATGA CA TGGAAAATGGG 
1 16. seq 07 C 7CC AC AT GGC TG T TGC7GA7AAATBGG77 GAG AT T A7 TC A6AAGAGAGATGAA6ATT6GAA AA TCGG 

M700 M710 M720 M730 *1740 M750 M760 

#690 #700 #7 10 #720 #730 #740 #750 

125*94. esq TG AC AT T G7GCATACAC7 C AC C A AC AGA AGG 7 6GT TGG AAAAATGTGT TGC T TATC C TCAAAGTCA TGAC 

TGACA77G7 CATA CT ACCAACAG G6TGGTTCCAAAA TGTGTT CTTATGCTGAAAGTCATCAC 
1 16. seq TGACA77G7ACATA7GC7CACCAACAGGCGG7GGTTGGAAAAGTGTGTT7CTTATGCTGAAAGTCATGAC 

M770 M7B0 M790 *-1600 M810 M820 M630 

#760 #770 #760 #790 #600 #610 #620 

125*94. seq CAAGCTCTTGTTGGTGACAAAACTATTGCA77TTGGCTCATCGACAAGGACATCTACGACTTCAT66CTC 

CA GC CTTGTTGGTG AC A AAAC TAT TGC A7TTTGGCTG ATCGACAAGGA ATGTA GACTTCATGGCTC 
1 16. seq CAGGCCCTTGT7GGTGACAAAACTATTGCATTTTGGCTGAT6GACAAGGATATGTATGACTTCATGGCTC 

M840 % 1850 *1660 M870 M880 M690 *1900 

#630 #640 #650 #660 #870 #680 #690 

125*94. seq GTCACAGACCATCTACTCCTC7TATAGATCG7GGAATAGCATT6CACAAAATGATCAGGCTTATTACCA7 
TGACAGACCATC7AC CCTCT ATAGATCGTGGA TAGCATT6CACAAAATGATCA68CTTATTACCAT 
1 16 seq TTGACAGACC A7C7 ACCCC7C7CA7AGA7CG7GGAG7AGCAT7GCACAAAAT6ATCAGGCTTATTACCAT 

M9I0 V920 *1930 M940 M950 M960 M970 

#900 #9 10 #920 #930 #640 #950 #960 

125*94. seq GGGCTTACGCCGAGAAGGATATTTGAATTTTATGGGAAA7QAATTTGCACATCCTCAGTGGATTGATTTT 

GGG TTACGCGGAGAAGGA7ATTTGAATTTTATGGGAAA76AATTTGGACA CC GAGTGGATTGATTTT 
1 16. seq GGGATTAGGCGGAGAAGGAT ATT7GAAT777ATGGGAAATGAATTTGGAC ACCCCGAGTGGATTGATTTT 

*1980 *1990 *2000 K 2Q\0 *2020 *2030 4 2040 
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Fig.5 (Cont). 



#970 #960 #990 #1000 #1010 #1020 #1030 

125*94. CCAAGACGGGATCCACATCTGCCCAATCGTAAACTAATTCCAGCCAACAACCACACTTATGATAAATCCC 

CCAA6AG6 GATC ACATCT CCCA TGGTAAA T TTCC GGGAACAA ACACTTATGATAAATGCC 
1 16. »q CCAACAGGTGATCTACATCTTCCCAGTGGTAAATrTGTTCClCCCAACAATTACAGTTATGATAAATGCC 

'2050 '2060 *2070 '2060 *2090 *2100 <2110 

*1040 #1050 #1060 #1070 #1060 #1090 #1100 
125-94. »«q GTCGTAGATTTGATCTAGGTGATCCACACTATCTAAGATATCATGGAATGCUQAGTTTGATCAGGCAAT 

G CCTAG TTTGATCTAGC AT CA A ATCT ACATATCATG6AATGCAACAGTTTCATCA 6CAAT 
1 16. «q GGCGTACGTTTGATCTAGCCAATTCAAAGCATCTGAGATATCATGGAATGCAAGAGTTTGATCAA6CAAT 

*2120 *2130 *2140 '2150 '2160 '2170 *2180 
.#1110 #1120 #1130 #1140 #1160 #1160 #1170 
125*94. t«q GCAACATCTTGAAGAACCCTATGGTTTCATGACTTCTGAGCACCAGTATATATCACQOAAGGATGAACGA 

CA CATCTTGAAGAAGCCTATGGTTTCATGACTTCTGAGCACCA TA ATATCACCCAACGATGAA G 
1 1ft Nq TCACCATCTTCAACAACCCTATCGTTTCATCACTTCTGAttCACCAATACATATCACGCAACCArCAAACG 

'2190 *2200 '2210 '2220 *2230 *2240 '22S0 

#1160 #1190 #1200 #1210 #1220 #1230 #1240 
125*94. **q &ATCCGATCATTGTCTTTGACACCGGAAACCTTGTTTTTGTATTCAACTTTCATTG6ACTAACA6CTATT 

CATC GOAT CAT TGTCTT 6ABA6G6CAAACCT GTTTTT6TATTCAA TTTCATTGGACTA CACCTATT 
116. ««q . GATCGGATCATTGTCTTCGAGAGGOCAAACCTCGTTTTTGTATTCAATTTTCATTGGACTAGCAGCTATT 

*2260 '2270 *2260 '2290 *2300 '2310 '2320 

#1250 #1260 #1270 #1280 #1290 #1300 #1310 
125*94. i»q C AGAT TAC C C AC T T 6GCT GC TTC AACTC AGO AAAG TAG AAftAT TGTTTTGGAC T CGGATCATCCC TTC TT 

C 9ATTACCGAGTTGGCTGCTT AAG C AGGAAAG T AC AAG AT GT TTCCA TC GATGAT TTCTT 
1 16. »»q CGGATTACCGAGTTGGCTGCTTAAAGCCAGGAAAGTACAAGATAGTCTTGGATTCAGATCATCCTTTGTT 

*2330 '2340 '2350 *2360 '2370 '2360 *2390 

#1320 #1330 #1340 #1350 #1360 #1370 #1360 
125*94. acq TGGAGGCTTCAACAGGCTTAGTCATGATGCCGAGCACTTCACCTTTGACGG6TCGTATGATAACC09CCT 

TGGAGGCTT "CAG6CTTA6TCATGATGC GAGCACTTCA CTTTGA GGGTGGTA GATAACCCOCCT 
1 16. a*q TG6AGGCTTT0GCAGGCTTAGTCATGATGCAGAGCACTTCAGCTTTGAAG6GTGGTACGATAACCQBCCT 

*2400 '2Q10 *2420 '2*30 '2440 *2460 '2460 

#1390 #1400 #1410 #1420 #1430 #1440 #1450 
125*94. uq CGGTCCTTCATGGTATATGCACCATCTAGGACAGCAGTG6TCCATGCTTTAGTA6AACATGAAG 

CG TCCTTCATG6T TA CACCAT TAG ACAGCACTGGTC AT6CTTTAGT CA CATOAAG 
1 16. uq CGATCCTTCATGGTGTACACACCATGTACAACAQCAGTGGTCTATGCTTTAGT6QAG6AT0AA6 

*2470 *2480 '2490 '2500 *25 10 '2520 '2530 
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Fig.6. 

#10 r20 f30 r40 «50 <flO ~~ #70 

126-94. pro SrCYHVTNFFAPSSfiFGTPOOLKSLIOXAHELGLLVLnOlVHSHAStmTLOaNltTOCTDSHTrHSGSRC 
SF6YHVTNF: A: SSRFGTPODLKSl 1 OKAHELGLLVLftO I VHSHAS. NTLDGLNflFOGTO: HYFHSft RG 
1 16. pro SFGYHVTNFYAASSRFGTPDDLKSL IDXAHELGLLVLHDI VHSMA3TMTU>CLNIlF0GTDCHYFHSePRe 
1370 *380 *3*> MOO MlO M20 M30 
#80 #90 #100 #110 #120 #130 #1*0 

125-84. pro HHWLWDSRLFMYCSWEVLRFLLSKARVWLEEYI^lWRrOCVTSHKYTPHGLQVArTGillfllEYrGYATOV 
KH* WDSRLFtt YGSVE VLRTLL SNAftWMj EY: FOfiPRFOCVTSHMYT. HGLOV. FTCMYKEYFGYATDV 
H 6. pro W«HW03«LFNYCS^VLRFLLSMARVWLKri(F DGFRfXCYTSHflYTHHCWVOFTCHYmFGYATOV 
M40 M50 M60 M70 MM MOO MOO 
#150 #160 #170 #180 #190 #200 #210 
125-94. pro DAY I YL6L VHOHI HCLFPEAVT I 6EDVSCKPTFC t PVEOGCVGFDYRLKMA I AOKVIE I LKKROCOWKRG 
OAV: YLKU NOM I HCLFPEAVT ICED VS& PT C I P YE0GCV8FDYRLHNA: ADXtr. EI: : KR0E0MCM8 
1 16. pro DAY V YLHLLNOHf HGLFPE A VT I GEDVS6KPTVC ! PVEDGCVCFDYflLHftAYADJCWE I IQXRDEOVKHO 
. *610 *520 H30 M40 *550 *660 M70 

#220 #230 #240 #250 #260 #270 #280 

125- 94. pro 01 YHTLTNRWLEKCY A YAE8H00ALV W)KT t AFVLHDXOflYDFHARDRPSTPl. IORC I ALHKHIRL I TH 

01 VH LTNRRWtEKCV: YAE3HD0ALVG0KT1AFWLn0K0f1YDFHA DftPSTPL I ORG: ALHKfl I RL I TH 
1 16. pro 01 V W1LTNRRW.EICC VS YAESHOOALYGWCT I AFVLHDK0HYDFr1ALDRP3TPL IDROVALHKH. I RL ITfl 
*580 M90 *600 *610 *820 *630 M40 

#290 #300 #310 #320 #330 #340 #350 

126- 94. pro CLCCECYINFHCMEFCHPEW I OFPRGORHLPNGKV I PGMNHSYDKCRftf FDLGOADYLRYNGrtOCFOOAft 

GUG6E B Y t HHtGNEF 6HPEW I OF PK6D HIP: 6K : PGNN. SYDKCRRRF0L6: : . . LRYHGftQEFDOA: 
1 16. pro GLGttBYLHFKCNEFGHPEWIDF PimtHiP5CKFVPBi«Y3YWCRRflFaGHSKHUlY>«W£rTIQAI 
M50 M60 *670 *880 *690 *700 *710 
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